High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
High- and low-level hierarchical classification algorithm based on source separation process
NASA Astrophysics Data System (ADS)
Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber
2016-10-01
High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Hyper-spectral image segmentation using spectral clustering with covariance descriptors
NASA Astrophysics Data System (ADS)
Kursun, Olcay; Karabiber, Fethullah; Koc, Cemalettin; Bal, Abdullah
2009-02-01
Image segmentation is an important and difficult computer vision problem. Hyper-spectral images pose even more difficulty due to their high-dimensionality. Spectral clustering (SC) is a recently popular clustering/segmentation algorithm. In general, SC lifts the data to a high dimensional space, also known as the kernel trick, then derive eigenvectors in this new space, and finally using these new dimensions partition the data into clusters. We demonstrate that SC works efficiently when combined with covariance descriptors that can be used to assess pixelwise similarities rather than in the high-dimensional Euclidean space. We present the formulations and some preliminary results of the proposed hybrid image segmentation method for hyper-spectral images.
NASA Astrophysics Data System (ADS)
Wang, Wei; Yang, Jiong
With the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes very common. Thus, mining high-dimensional data is an urgent problem of great practical importance. However, there are some unique challenges for mining data of high dimensions, including (1) the curse of dimensionality and more crucial (2) the meaningfulness of the similarity measure in the high dimension space. In this chapter, we present several state-of-art techniques for analyzing high-dimensional data, e.g., frequent pattern mining, clustering, and classification. We will discuss how these methods deal with the challenges of high dimensionality.
Understanding 3D human torso shape via manifold clustering
NASA Astrophysics Data System (ADS)
Li, Sheng; Li, Peng; Fu, Yun
2013-05-01
Discovering the variations in human torso shape plays a key role in many design-oriented applications, such as suit designing. With recent advances in 3D surface imaging technologies, people can obtain 3D human torso data that provide more information than traditional measurements. However, how to find different human shapes from 3D torso data is still an open problem. In this paper, we propose to use spectral clustering approach on torso manifold to address this problem. We first represent high-dimensional torso data in a low-dimensional space using manifold learning algorithm. Then the spectral clustering method is performed to get several disjoint clusters. Experimental results show that the clusters discovered by our approach can describe the discrepancies in both genders and human shapes, and our approach achieves better performance than the compared clustering method.
High-resolution Self-Organizing Maps for advanced visualization and dimension reduction.
Saraswati, Ayu; Nguyen, Van Tuc; Hagenbuchner, Markus; Tsoi, Ah Chung
2018-05-04
Kohonen's Self Organizing feature Map (SOM) provides an effective way to project high dimensional input features onto a low dimensional display space while preserving the topological relationships among the input features. Recent advances in algorithms that take advantages of modern computing hardware introduced the concept of high resolution SOMs (HRSOMs). This paper investigates the capabilities and applicability of the HRSOM as a visualization tool for cluster analysis and its suitabilities to serve as a pre-processor in ensemble learning models. The evaluation is conducted on a number of established benchmarks and real-world learning problems, namely, the policeman benchmark, two web spam detection problems, a network intrusion detection problem, and a malware detection problem. It is found that the visualization resulted from an HRSOM provides new insights concerning these learning problems. It is furthermore shown empirically that broad benefits from the use of HRSOMs in both clustering and classification problems can be expected. Copyright © 2018 Elsevier Ltd. All rights reserved.
2016-02-01
Modified Cheeger and Ratio Cut Methods Using the Ginzburg-Landau Functional for Classification of High-Dimensional Data Ekaterina Merkurjev*, Andrea...bertozzi@math.ucla.edu, xiaoran@isi.edu, lerman@isi.edu. Abstract Recent advances in clustering have included continuous relaxations of the Cheeger cut ...fully nonlinear Cheeger cut problem, as well as the ratio cut optimization task. Both problems are connected to total variation minimization, and the
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Model-based Clustering of High-Dimensional Data in Astrophysics
NASA Astrophysics Data System (ADS)
Bouveyron, C.
2016-05-01
The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Kiranyaz, Serkan; Ince, Turker; Pulkkinen, Jenni; Gabbouj, Moncef
2010-01-01
In this paper, we address dynamic clustering in high dimensional data or feature spaces as an optimization problem where multi-dimensional particle swarm optimization (MD PSO) is used to find out the true number of clusters, while fractional global best formation (FGBF) is applied to avoid local optima. Based on these techniques we then present a novel and personalized long-term ECG classification system, which addresses the problem of labeling the beats within a long-term ECG signal, known as Holter register, recorded from an individual patient. Due to the massive amount of ECG beats in a Holter register, visual inspection is quite difficult and cumbersome, if not impossible. Therefore the proposed system helps professionals to quickly and accurately diagnose any latent heart disease by examining only the representative beats (the so called master key-beats) each of which is representing a cluster of homogeneous (similar) beats. We tested the system on a benchmark database where the beats of each Holter register have been manually labeled by cardiologists. The selection of the right master key-beats is the key factor for achieving a highly accurate classification and the proposed systematic approach produced results that were consistent with the manual labels with 99.5% average accuracy, which basically shows the efficiency of the system.
The GALAH survey: chemical tagging of star clusters and new members in the Pleiades
NASA Astrophysics Data System (ADS)
Kos, Janez; Bland-Hawthorn, Joss; Freeman, Ken; Buder, Sven; Traven, Gregor; De Silva, Gayandhi M.; Sharma, Sanjib; Asplund, Martin; Duong, Ly; Lin, Jane; Lind, Karin; Martell, Sarah; Simpson, Jeffrey D.; Stello, Dennis; Zucker, Daniel B.; Zwitter, Tomaž; Anguiano, Borja; Da Costa, Gary; D'Orazi, Valentina; Horner, Jonathan; Kafle, Prajwal R.; Lewis, Geraint; Munari, Ulisse; Nataf, David M.; Ness, Melissa; Reid, Warren; Schlesinger, Katie; Ting, Yuan-Sen; Wyse, Rosemary
2018-02-01
The technique of chemical tagging uses the elemental abundances of stellar atmospheres to 'reconstruct' chemically homogeneous star clusters that have long since dispersed. The GALAH spectroscopic survey - which aims to observe one million stars using the Anglo-Australian Telescope - allows us to measure up to 30 elements or dimensions in the stellar chemical abundance space, many of which are not independent. How to find clustering reliably in a noisy high-dimensional space is a difficult problem that remains largely unsolved. Here, we explore t-distributed stochastic neighbour embedding (t-SNE) - which identifies an optimal mapping of a high-dimensional space into fewer dimensions - whilst conserving the original clustering information. Typically, the projection is made to a 2D space to aid recognition of clusters by eye. We show that this method is a reliable tool for chemical tagging because it can: (i) resolve clustering in chemical space alone, (ii) recover known open and globular clusters with high efficiency and low contamination, and (iii) relate field stars to known clusters. t-SNE also provides a useful visualization of a high-dimensional space. We demonstrate the method on a data set of 13 abundances measured in the spectra of 187 000 stars by the GALAH survey. We recover seven of the nine observed clusters (six globular and three open clusters) in chemical space with minimal contamination from field stars and low numbers of outliers. With chemical tagging, we also identify two Pleiades supercluster members (which we confirm kinematically), one as far as 6° - one tidal radius away from the cluster centre.
Filippov, Alexander E; Gorb, Stanislav N
2015-02-06
One of the important problems appearing in experimental realizations of artificial adhesives inspired by gecko foot hair is so-called clusterization. If an artificially produced structure is flexible enough to allow efficient contact with natural rough surfaces, after a few attachment-detachment cycles, the fibres of the structure tend to adhere one to another and form clusters. Normally, such clusters are much larger than original fibres and, because they are less flexible, form much worse adhesive contacts especially with the rough surfaces. Main problem here is that the forces responsible for the clusterization are the same intermolecular forces which attract fibres to fractal surface of the substrate. However, arrays of real gecko setae are much less susceptible to this problem. One of the possible reasons for this is that ends of the seta have more sophisticated non-uniformly distributed three-dimensional structure than that of existing artificial systems. In this paper, we simulated three-dimensional spatial geometry of non-uniformly distributed branches of nanofibres of the setal tip numerically, studied its attachment-detachment dynamics and discussed its advantages versus uniformly distributed geometry.
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao
2014-01-01
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
Data-driven cluster reinforcement and visualization in sparsely-matched self-organizing maps.
Manukyan, Narine; Eppstein, Margaret J; Rizzo, Donna M
2012-05-01
A self-organizing map (SOM) is a self-organized projection of high-dimensional data onto a typically 2-dimensional (2-D) feature map, wherein vector similarity is implicitly translated into topological closeness in the 2-D projection. However, when there are more neurons than input patterns, it can be challenging to interpret the results, due to diffuse cluster boundaries and limitations of current methods for displaying interneuron distances. In this brief, we introduce a new cluster reinforcement (CR) phase for sparsely-matched SOMs. The CR phase amplifies within-cluster similarity in an unsupervised, data-driven manner. Discontinuities in the resulting map correspond to between-cluster distances and are stored in a boundary (B) matrix. We describe a new hierarchical visualization of cluster boundaries displayed directly on feature maps, which requires no further clustering beyond what was implicitly accomplished during self-organization in SOM training. We use a synthetic benchmark problem and previously published microbial community profile data to demonstrate the benefits of the proposed methods.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu
2009-01-01
Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pal, Ranjan; Chelmis, Charalampos; Aman, Saima
The advent of smart meters and advanced communication infrastructures catalyzes numerous smart grid applications such as dynamic demand response, and paves the way to solve challenging research problems in sustainable energy consumption. The space of solution possibilities are restricted primarily by the huge amount of generated data requiring considerable computational resources and efficient algorithms. To overcome this Big Data challenge, data clustering techniques have been proposed. Current approaches however do not scale in the face of the “increasing dimensionality” problem where a cluster point is represented by the entire customer consumption time series. To overcome this aspect we first rethinkmore » the way cluster points are created and designed, and then design an efficient online clustering technique for demand response (DR) in order to analyze high volume, high dimensional energy consumption time series data at scale, and on the fly. Our online algorithm is randomized in nature, and provides optimal performance guarantees in a computationally efficient manner. Unlike prior work we (i) study the consumption properties of the whole population simultaneously rather than developing individual models for each customer separately, claiming it to be a ‘killer’ approach that breaks the “curse of dimensionality” in online time series clustering, and (ii) provide tight performance guarantees in theory to validate our approach. Our insights are driven by the field of sociology, where collective behavior often emerges as the result of individual patterns and lifestyles.« less
Chen, Chien-Chang; Juan, Hung-Hui; Tsai, Meng-Yuan; Lu, Henry Horng-Shing
2018-01-11
By introducing the methods of machine learning into the density functional theory, we made a detour for the construction of the most probable density function, which can be estimated by learning relevant features from the system of interest. Using the properties of universal functional, the vital core of density functional theory, the most probable cluster numbers and the corresponding cluster boundaries in a studying system can be simultaneously and automatically determined and the plausibility is erected on the Hohenberg-Kohn theorems. For the method validation and pragmatic applications, interdisciplinary problems from physical to biological systems were enumerated. The amalgamation of uncharged atomic clusters validated the unsupervised searching process of the cluster numbers and the corresponding cluster boundaries were exhibited likewise. High accurate clustering results of the Fisher's iris dataset showed the feasibility and the flexibility of the proposed scheme. Brain tumor detections from low-dimensional magnetic resonance imaging datasets and segmentations of high-dimensional neural network imageries in the Brainbow system were also used to inspect the method practicality. The experimental results exhibit the successful connection between the physical theory and the machine learning methods and will benefit the clinical diagnoses.
Cluster ensemble based on Random Forests for genetic data.
Alhusain, Luluah; Hafez, Alaaeldin M
2017-01-01
Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.
a Probabilistic Embedding Clustering Method for Urban Structure Detection
NASA Astrophysics Data System (ADS)
Lin, X.; Li, H.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M.
2017-09-01
Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by "learning" via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.
Distributed Computation of the knn Graph for Large High-Dimensional Point Sets
Plaku, Erion; Kavraki, Lydia E.
2009-01-01
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318
Statistical Significance for Hierarchical Clustering
Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.
2017-01-01
Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
Sparsity enabled cluster reduced-order models for control
NASA Astrophysics Data System (ADS)
Kaiser, Eurika; Morzyński, Marek; Daviller, Guillaume; Kutz, J. Nathan; Brunton, Bingni W.; Brunton, Steven L.
2018-01-01
Characterizing and controlling nonlinear, multi-scale phenomena are central goals in science and engineering. Cluster-based reduced-order modeling (CROM) was introduced to exploit the underlying low-dimensional dynamics of complex systems. CROM builds a data-driven discretization of the Perron-Frobenius operator, resulting in a probabilistic model for ensembles of trajectories. A key advantage of CROM is that it embeds nonlinear dynamics in a linear framework, which enables the application of standard linear techniques to the nonlinear system. CROM is typically computed on high-dimensional data; however, access to and computations on this full-state data limit the online implementation of CROM for prediction and control. Here, we address this key challenge by identifying a small subset of critical measurements to learn an efficient CROM, referred to as sparsity-enabled CROM. In particular, we leverage compressive measurements to faithfully embed the cluster geometry and preserve the probabilistic dynamics. Further, we show how to identify fewer optimized sensor locations tailored to a specific problem that outperform random measurements. Both of these sparsity-enabled sensing strategies significantly reduce the burden of data acquisition and processing for low-latency in-time estimation and control. We illustrate this unsupervised learning approach on three different high-dimensional nonlinear dynamical systems from fluids with increasing complexity, with one application in flow control. Sparsity-enabled CROM is a critical facilitator for real-time implementation on high-dimensional systems where full-state information may be inaccessible.
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters
NASA Astrophysics Data System (ADS)
Mahmoud, E.; Shoukry, A.; Takey, A.
2018-04-01
Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values compared to published spectroscopic redshifts with a 0.029 standard deviation of their differences.
Estimation of Complex Generalized Linear Mixed Models for Measurement and Growth
ERIC Educational Resources Information Center
Jeon, Minjeong
2012-01-01
Maximum likelihood (ML) estimation of generalized linear mixed models (GLMMs) is technically challenging because of the intractable likelihoods that involve high dimensional integrations over random effects. The problem is magnified when the random effects have a crossed design and thus the data cannot be reduced to small independent clusters. A…
Aerodynamics of Engine-Airframe Interaction
NASA Technical Reports Server (NTRS)
Caughey, D. A.
1986-01-01
The report describes progress in research directed towards the efficient solution of the inviscid Euler and Reynolds-averaged Navier-Stokes equations for transonic flows through engine inlets, and past complete aircraft configurations, with emphasis on the flowfields in the vicinity of engine inlets. The research focusses upon the development of solution-adaptive grid procedures for these problems, and the development of multi-grid algorithms in conjunction with both, implicit and explicit time-stepping schemes for the solution of three-dimensional problems. The work includes further development of mesh systems suitable for inlet and wing-fuselage-inlet geometries using a variational approach. Work during this reporting period concentrated upon two-dimensional problems, and has been in two general areas: (1) the development of solution-adaptive procedures to cluster the grid cells in regions of high (truncation) error;and (2) the development of a multigrid scheme for solution of the two-dimensional Euler equations using a diagonalized alternating direction implicit (ADI) smoothing algorithm.
Harnessing Sparse and Low-Dimensional Structures for Robust Clustering of Imagery Data
ERIC Educational Resources Information Center
Rao, Shankar Ramamohan
2009-01-01
We propose a robust framework for clustering data. In practice, data obtained from real measurement devices can be incomplete, corrupted by gross errors, or not correspond to any assumed model. We show that, by properly harnessing the intrinsic low-dimensional structure of the data, these kinds of practical problems can be dealt with in a uniform…
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.
Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain
2015-10-01
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Clustering high dimensional data using RIA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, Nazrina
2015-05-15
Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily andmore » hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.« less
Membership determination of open clusters based on a spectral clustering method
NASA Astrophysics Data System (ADS)
Gao, Xin-Hua
2018-06-01
We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.
Machine-learned cluster identification in high-dimensional data.
Ultsch, Alfred; Lötsch, Jörn
2017-02-01
High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets ...
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
Entropy-based consensus clustering for patient stratification.
Liu, Hongfu; Zhao, Rui; Fang, Hongsheng; Cheng, Feixiong; Fu, Yun; Liu, Yang-Yu
2017-09-01
Patient stratification or disease subtyping is crucial for precision medicine and personalized treatment of complex diseases. The increasing availability of high-throughput molecular data provides a great opportunity for patient stratification. Many clustering methods have been employed to tackle this problem in a purely data-driven manner. Yet, existing methods leveraging high-throughput molecular data often suffers from various limitations, e.g. noise, data heterogeneity, high dimensionality or poor interpretability. Here we introduced an Entropy-based Consensus Clustering (ECC) method that overcomes those limitations all together. Our ECC method employs an entropy-based utility function to fuse many basic partitions to a consensus one that agrees with the basic ones as much as possible. Maximizing the utility function in ECC has a much more meaningful interpretation than any other consensus clustering methods. Moreover, we exactly map the complex utility maximization problem to the classic K -means clustering problem, which can then be efficiently solved with linear time and space complexity. Our ECC method can also naturally integrate multiple molecular data types measured from the same set of subjects, and easily handle missing values without any imputation. We applied ECC to 110 synthetic and 48 real datasets, including 35 cancer gene expression benchmark datasets and 13 cancer types with four molecular data types from The Cancer Genome Atlas. We found that ECC shows superior performance against existing clustering methods. Our results clearly demonstrate the power of ECC in clinically relevant patient stratification. The Matlab package is available at http://scholar.harvard.edu/yyl/ecc . yunfu@ece.neu.edu or yyl@channing.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.
ERIC Educational Resources Information Center
Carter, Randy L.; And Others
1989-01-01
The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
Efficient implementation of parallel three-dimensional FFT on clusters of PCs
NASA Astrophysics Data System (ADS)
Takahashi, Daisuke
2003-05-01
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Linear solver performance in elastoplastic problem solution on GPU cluster
NASA Astrophysics Data System (ADS)
Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.
2017-12-01
Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
Robles, Guillermo; Fresno, José Manuel; Martínez-Tarifa, Juan Manuel; Ardila-Rey, Jorge Alfredo; Parrado-Hernández, Emilio
2018-03-01
The measurement of partial discharge (PD) signals in the radio frequency (RF) range has gained popularity among utilities and specialized monitoring companies in recent years. Unfortunately, in most of the occasions the data are hidden by noise and coupled interferences that hinder their interpretation and renders them useless especially in acquisition systems in the ultra high frequency (UHF) band where the signals of interest are weak. This paper is focused on a method that uses a selective spectral signal characterization to feature each signal, type of partial discharge or interferences/noise, with the power contained in the most representative frequency bands. The technique can be considered as a dimensionality reduction problem where all the energy information contained in the frequency components is condensed in a reduced number of UHF or high frequency (HF) and very high frequency (VHF) bands. In general, dimensionality reduction methods make the interpretation of results a difficult task because the inherent physical nature of the signal is lost in the process. The proposed selective spectral characterization is a preprocessing tool that facilitates further main processing. The starting point is a clustering of signals that could form the core of a PD monitoring system. Therefore, the dimensionality reduction technique should discover the best frequency bands to enhance the affinity between signals in the same cluster and the differences between signals in different clusters. This is done maximizing the minimum Mahalanobis distance between clusters using particle swarm optimization (PSO). The tool is tested with three sets of experimental signals to demonstrate its capabilities in separating noise and PDs with low signal-to-noise ratio and separating different types of partial discharges measured in the UHF and HF/VHF bands.
Fast Multipole Methods for Three-Dimensional N-body Problems
NASA Technical Reports Server (NTRS)
Koumoutsakos, P.
1995-01-01
We are developing computational tools for the simulations of three-dimensional flows past bodies undergoing arbitrary motions. High resolution viscous vortex methods have been developed that allow for extended simulations of two-dimensional configurations such as vortex generators. Our objective is to extend this methodology to three dimensions and develop a robust computational scheme for the simulation of such flows. A fundamental issue in the use of vortex methods is the ability of employing efficiently large numbers of computational elements to resolve the large range of scales that exist in complex flows. The traditional cost of the method scales as Omicron (N(sup 2)) as the N computational elements/particles induce velocities at each other, making the method unacceptable for simulations involving more than a few tens of thousands of particles. In the last decade fast methods have been developed that have operation counts of Omicron (N log N) or Omicron (N) (referred to as BH and GR respectively) depending on the details of the algorithm. These methods are based on the observation that the effect of a cluster of particles at a certain distance may be approximated by a finite series expansion. In order to exploit this observation we need to decompose the element population spatially into clusters of particles and build a hierarchy of clusters (a tree data structure) - smaller neighboring clusters combine to form a cluster of the next size up in the hierarchy and so on. This hierarchy of clusters allows one to determine efficiently when the approximation is valid. This algorithm is an N-body solver that appears in many fields of engineering and science. Some examples of its diverse use are in astrophysics, molecular dynamics, micro-magnetics, boundary element simulations of electromagnetic problems, and computer animation. More recently these N-body solvers have been implemented and applied in simulations involving vortex methods. Koumoutsakos and Leonard (1995) implemented the GR scheme in two dimensions for vector computer architectures allowing for simulations of bluff body flows using millions of particles. Winckelmans presented three-dimensional, viscous simulations of interacting vortex rings, using vortons and an implementation of a BH scheme for parallel computer architectures. Bhatt presented a vortex filament method to perform inviscid vortex ring interactions, with an alternative implementation of a BH scheme for a Connection Machine parallel computer architecture.
NASA Astrophysics Data System (ADS)
Karimi, Hamed; Rosenberg, Gili; Katzgraber, Helmut G.
2017-10-01
We present and apply a general-purpose, multistart algorithm for improving the performance of low-energy samplers used for solving optimization problems. The algorithm iteratively fixes the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are smaller and less connected, and samplers tend to give better low-energy samples for these problems. The algorithm is trivially parallelizable since each start in the multistart algorithm is independent, and could be applied to any heuristic solver that can be run multiple times to give a sample. We present results for several classes of hard problems solved using simulated annealing, path-integral quantum Monte Carlo, parallel tempering with isoenergetic cluster moves, and a quantum annealer, and show that the success metrics and the scaling are improved substantially. When combined with this algorithm, the quantum annealer's scaling was substantially improved for native Chimera graph problems. In addition, with this algorithm the scaling of the time to solution of the quantum annealer is comparable to the Hamze-de Freitas-Selby algorithm on the weak-strong cluster problems introduced by Boixo et al. Parallel tempering with isoenergetic cluster moves was able to consistently solve three-dimensional spin glass problems with 8000 variables when combined with our method, whereas without our method it could not solve any.
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla
2014-03-01
Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hajian, Amir; Alvarez, Marcelo A.; Bond, J. Richard, E-mail: ahajian@cita.utoronto.ca, E-mail: malvarez@cita.utoronto.ca, E-mail: bond@cita.utoronto.ca
Making mock simulated catalogs is an important component of astrophysical data analysis. Selection criteria for observed astronomical objects are often too complicated to be derived from first principles. However the existence of an observed group of objects is a well-suited problem for machine learning classification. In this paper we use one-class classifiers to learn the properties of an observed catalog of clusters of galaxies from ROSAT and to pick clusters from mock simulations that resemble the observed ROSAT catalog. We show how this method can be used to study the cross-correlations of thermal Sunya'ev-Zeldovich signals with number density maps ofmore » X-ray selected cluster catalogs. The method reduces the bias due to hand-tuning the selection function and is readily scalable to large catalogs with a high-dimensional space of astrophysical features.« less
Deterministic annealing for density estimation by multivariate normal mixtures
NASA Astrophysics Data System (ADS)
Kloppenburg, Martin; Tavan, Paul
1997-03-01
An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.
Calculation of flow about posts and powerhead model. [space shuttle main engine
NASA Technical Reports Server (NTRS)
Anderson, P. G.; Farmer, R. C.
1985-01-01
A three dimensional analysis of the non-uniform flow around the liquid oxygen (LOX) posts in the Space Shuttle Main Engine (SSME) powerhead was performed to determine possible factors contributing to the failure of the posts. Also performed was three dimensional numerical fluid flow analysis of the high pressure fuel turbopump (HPFTP) exhaust system, consisting of the turnaround duct (TAD), two-duct hot gas manifold (HGM), and the Version B transfer ducts. The analysis was conducted in the following manner: (1) modeling the flow around a single and small clusters (2 to 10) of posts; (2) modeling the velocity field in the cross plane; and (3) modeling the entire flow region with a three dimensional network type model. Shear stress functions which will permit viscous analysis without requiring excessive numbers of computational grid points were developed. These wall functions, laminar and turbulent, have been compared to standard Blasius solutions and are directly applicable to the cylinder in cross flow class of problems to which the LOX post problem belongs.
NASA Technical Reports Server (NTRS)
Balakumar, P.; Jeyasingham, Samarasingham
1999-01-01
A program is developed to investigate the linear stability of three-dimensional compressible boundary layer flows over bodies of revolutions. The problem is formulated as a two dimensional (2D) eigenvalue problem incorporating the meanflow variations in the normal and azimuthal directions. Normal mode solutions are sought in the whole plane rather than in a line normal to the wall as is done in the classical one dimensional (1D) stability theory. The stability characteristics of a supersonic boundary layer over a sharp cone with 50 half-angle at 2 degrees angle of attack is investigated. The 1D eigenvalue computations showed that the most amplified disturbances occur around x(sub 2) = 90 degrees and the azimuthal mode number for the most amplified disturbances range between m = -30 to -40. The frequencies of the most amplified waves are smaller in the middle region where the crossflow dominates the instability than the most amplified frequencies near the windward and leeward planes. The 2D eigenvalue computations showed that due to the variations in the azimuthal direction, the eigenmodes are clustered into isolated confined regions. For some eigenvalues, the eigenfunctions are clustered in two regions. Due to the nonparallel effect in the azimuthal direction, the eigenmodes are clustered into isolated confined regions. For some eigenvalues, the eigenfunctions are clustered in two regions. Due to the nonparallel effect in the azimuthal direction, the most amplified disturbances are shifted to 120 degrees compared to 90 degrees for the parallel theory. It is also observed that the nonparallel amplification rates are smaller than that is obtained from the parallel theory.
Study on Data Clustering and Intelligent Decision Algorithm of Indoor Localization
NASA Astrophysics Data System (ADS)
Liu, Zexi
2018-01-01
Indoor positioning technology enables the human beings to have the ability of positional perception in architectural space, and there is a shortage of single network coverage and the problem of location data redundancy. So this article puts forward the indoor positioning data clustering algorithm and intelligent decision-making research, design the basic ideas of multi-source indoor positioning technology, analyzes the fingerprint localization algorithm based on distance measurement, position and orientation of inertial device integration. By optimizing the clustering processing of massive indoor location data, the data normalization pretreatment, multi-dimensional controllable clustering center and multi-factor clustering are realized, and the redundancy of locating data is reduced. In addition, the path is proposed based on neural network inference and decision, design the sparse data input layer, the dynamic feedback hidden layer and output layer, low dimensional results improve the intelligent navigation path planning.
Value-based customer grouping from large retail data sets
NASA Astrophysics Data System (ADS)
Strehl, Alexander; Ghosh, Joydeep
2000-04-01
In this paper, we propose OPOSSUM, a novel similarity-based clustering algorithm using constrained, weighted graph- partitioning. Instead of binary presence or absence of products in a market-basket, we use an extended 'revenue per product' measure to better account for management objectives. Typically the number of clusters desired in a database marketing application is only in the teens or less. OPOSSUM proceeds top-down, which is more efficient and takes a small number of steps to attain the desired number of clusters as compared to bottom-up agglomerative clustering approaches. OPOSSUM delivers clusters that are balanced in terms of either customers (samples) or revenue (value). To facilitate data exploration and validation of results we introduce CLUSION, a visualization toolkit for high-dimensional clustering problems. To enable closed loop deployment of the algorithm, OPOSSUM has no user-specified parameters. Thresholding heuristics are avoided and the optimal number of clusters is automatically determined by a search for maximum performance. Results are presented on a real retail industry data-set of several thousand customers and products, to demonstrate the power of the proposed technique.
Random Walk Method for Potential Problems
NASA Technical Reports Server (NTRS)
Krishnamurthy, T.; Raju, I. S.
2002-01-01
A local Random Walk Method (RWM) for potential problems governed by Lapalace's and Paragon's equations is developed for two- and three-dimensional problems. The RWM is implemented and demonstrated in a multiprocessor parallel environment on a Beowulf cluster of computers. A speed gain of 16 is achieved as the number of processors is increased from 1 to 23.
NASA Astrophysics Data System (ADS)
Hernawati, Kuswari; Insani, Nur; Bambang S. H., M.; Nur Hadi, W.; Sahid
2017-08-01
This research aims to mapping the 33 (thirty-three) provinces in Indonesia, based on the data on air, water and soil pollution, as well as social demography and geography data, into a clustered model. The method used in this study was unsupervised method that combines the basic concept of Kohonen or Self-Organizing Feature Maps (SOFM). The method is done by providing the design parameters for the model based on data related directly/ indirectly to pollution, which are the demographic and social data, pollution levels of air, water and soil, as well as the geographical situation of each province. The parameters used consists of 19 features/characteristics, including the human development index, the number of vehicles, the availability of the plant's water absorption and flood prevention, as well as geographic and demographic situation. The data used were secondary data from the Central Statistics Agency (BPS), Indonesia. The data are mapped into SOFM from a high-dimensional vector space into two-dimensional vector space according to the closeness of location in term of Euclidean distance. The resulting outputs are represented in clustered grouping. Thirty-three provinces are grouped into five clusters, where each cluster has different features/characteristics and level of pollution. The result can used to help the efforts on prevention and resolution of pollution problems on each cluster in an effective and efficient way.
NASA Technical Reports Server (NTRS)
Socolovsky, Eduardo A.; Bushnell, Dennis M. (Technical Monitor)
2002-01-01
The cosine or correlation measures of similarity used to cluster high dimensional data are interpreted as projections, and the orthogonal components are used to define a complementary dissimilarity measure to form a similarity-dissimilarity measure pair. Using a geometrical approach, a number of properties of this pair is established. This approach is also extended to general inner-product spaces of any dimension. These properties include the triangle inequality for the defined dissimilarity measure, error estimates for the triangle inequality and bounds on both measures that can be obtained with a few floating-point operations from previously computed values of the measures. The bounds and error estimates for the similarity and dissimilarity measures can be used to reduce the computational complexity of clustering algorithms and enhance their scalability, and the triangle inequality allows the design of clustering algorithms for high dimensional distributed data.
Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M
2013-01-01
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
From globally coupled maps to complex-systems biology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaneko, Kunihiko, E-mail: kaneko@complex.c.u-tokyo.ac.jp
Studies of globally coupled maps, introduced as a network of chaotic dynamics, are briefly reviewed with an emphasis on novel concepts therein, which are universal in high-dimensional dynamical systems. They include clustering of synchronized oscillations, hierarchical clustering, chimera of synchronization and desynchronization, partition complexity, prevalence of Milnor attractors, chaotic itinerancy, and collective chaos. The degrees of freedom necessary for high dimensionality are proposed to equal the number in which the combinatorial exceeds the exponential. Future analysis of high-dimensional dynamical systems with regard to complex-systems biology is briefly discussed.
NASA Astrophysics Data System (ADS)
Einkemmer, Lukas
2016-05-01
The recently developed semi-Lagrangian discontinuous Galerkin approach is used to discretize hyperbolic partial differential equations (usually first order equations). Since these methods are conservative, local in space, and able to limit numerical diffusion, they are considered a promising alternative to more traditional semi-Lagrangian schemes (which are usually based on polynomial or spline interpolation). In this paper, we consider a parallel implementation of a semi-Lagrangian discontinuous Galerkin method for distributed memory systems (so-called clusters). Both strong and weak scaling studies are performed on the Vienna Scientific Cluster 2 (VSC-2). In the case of weak scaling we observe a parallel efficiency above 0.8 for both two and four dimensional problems and up to 8192 cores. Strong scaling results show good scalability to at least 512 cores (we consider problems that can be run on a single processor in reasonable time). In addition, we study the scaling of a two dimensional Vlasov-Poisson solver that is implemented using the framework provided. All of the simulations are conducted in the context of worst case communication overhead; i.e., in a setting where the CFL (Courant-Friedrichs-Lewy) number increases linearly with the problem size. The framework introduced in this paper facilitates a dimension independent implementation of scientific codes (based on C++ templates) using both an MPI and a hybrid approach to parallelization. We describe the essential ingredients of our implementation.
Visualization of unsteady computational fluid dynamics
NASA Astrophysics Data System (ADS)
Haimes, Robert
1994-11-01
A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Visualization of unsteady computational fluid dynamics
NASA Technical Reports Server (NTRS)
Haimes, Robert
1994-01-01
A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Self-organizing neural networks--an alternative way of cluster analysis in clinical chemistry.
Reibnegger, G; Wachter, H
1996-04-15
Supervised learning schemes have been employed by several workers for training neural networks designed to solve clinical problems. We demonstrate that unsupervised techniques can also produce interesting and meaningful results. Using a data set on the chemical composition of milk from 22 different mammals, we demonstrate that self-organizing feature maps (Kohonen networks) as well as a modified version of error backpropagation technique yield results mimicking conventional cluster analysis. Both techniques are able to project a potentially multi-dimensional input vector onto a two-dimensional space whereby neighborhood relationships remain conserved. Thus, these techniques can be used for reducing dimensionality of complicated data sets and for enhancing comprehensibility of features hidden in the data matrix.
Clustervision: Visual Supervision of Unsupervised Clustering.
Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam
2018-01-01
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Städler, Nicolas; Dondelinger, Frank; Hill, Steven M; Akbani, Rehan; Lu, Yiling; Mills, Gordon B; Mukherjee, Sach
2017-09-15
Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. As the Bioconductor package nethet. staedler.n@gmail.com or sach.mukherjee@dzne.de. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Penalized gaussian process regression and classification for high-dimensional nonlinear data.
Yi, G; Shi, J Q; Choi, T
2011-12-01
The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported. © 2011, The International Biometric Society No claim to original US government works.
Characteristics of voxel prediction power in full-brain Granger causality analysis of fMRI data
NASA Astrophysics Data System (ADS)
Garg, Rahul; Cecchi, Guillermo A.; Rao, A. Ravishankar
2011-03-01
Functional neuroimaging research is moving from the study of "activations" to the study of "interactions" among brain regions. Granger causality analysis provides a powerful technique to model spatio-temporal interactions among brain regions. We apply this technique to full-brain fMRI data without aggregating any voxel data into regions of interest (ROIs). We circumvent the problem of dimensionality using sparse regression from machine learning. On a simple finger-tapping experiment we found that (1) a small number of voxels in the brain have very high prediction power, explaining the future time course of other voxels in the brain; (2) these voxels occur in small sized clusters (of size 1-4 voxels) distributed throughout the brain; (3) albeit small, these clusters overlap with most of the clusters identified with the non-temporal General Linear Model (GLM); and (4) the method identifies clusters which, while not determined by the task and not detectable by GLM, still influence brain activity.
Cluster-based control of a separating flow over a smoothly contoured ramp
NASA Astrophysics Data System (ADS)
Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek
2017-12-01
The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.
Universal dynamical properties preclude standard clustering in a large class of biochemical data.
Gomez, Florian; Stoop, Ralph L; Stoop, Ruedi
2014-09-01
Clustering of chemical and biochemical data based on observed features is a central cognitive step in the analysis of chemical substances, in particular in combinatorial chemistry, or of complex biochemical reaction networks. Often, for reasons unknown to the researcher, this step produces disappointing results. Once the sources of the problem are known, improved clustering methods might revitalize the statistical approach of compound and reaction search and analysis. Here, we present a generic mechanism that may be at the origin of many clustering difficulties. The variety of dynamical behaviors that can be exhibited by complex biochemical reactions on variation of the system parameters are fundamental system fingerprints. In parameter space, shrimp-like or swallow-tail structures separate parameter sets that lead to stable periodic dynamical behavior from those leading to irregular behavior. We work out the genericity of this phenomenon and demonstrate novel examples for their occurrence in realistic models of biophysics. Although we elucidate the phenomenon by considering the emergence of periodicity in dependence on system parameters in a low-dimensional parameter space, the conclusions from our simple setting are shown to continue to be valid for features in a higher-dimensional feature space, as long as the feature-generating mechanism is not too extreme and the dimension of this space is not too high compared with the amount of available data. For online versions of super-paramagnetic clustering see http://stoop.ini.uzh.ch/research/clustering. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Bruck, Andrea M; Yin, Jiefu; Tong, Xiao; Takeuchi, Esther S; Takeuchi, Kenneth J; Szczepura, Lisa F; Marschilok, Amy C
2018-05-07
The cluster-based material Re 6 Se 8 Cl 2 is a two-dimensional ternary material with cluster-cluster bonding across the a and b axes capable of multiple electron transfer accompanied by ion insertion across the c axis. The Li/Re 6 Se 8 Cl 2 system showed reversible electron transfer from 1 to 3 electron equivalents (ee) at high current densities (88 mA/g). Upon cycling to 4 ee, there was evidence of capacity degradation over 50 cycles associated with the formation of an organic solid-electrolyte interface (between 1.45 and 1 V vs Li/Li + ). This investigation highlights the ability of cluster-based materials with two-dimensional cluster bonding to be used in applications such as energy storage, showing structural stability and high rate capability.
State estimation and prediction using clustered particle filters.
Lee, Yoonsang; Majda, Andrew J
2016-12-20
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.
State estimation and prediction using clustered particle filters
Lee, Yoonsang; Majda, Andrew J.
2016-01-01
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332
2012-01-01
Background Dimensionality reduction (DR) enables the construction of a lower dimensional space (embedding) from a higher dimensional feature space while preserving object-class discriminability. However several popular DR approaches suffer from sensitivity to choice of parameters and/or presence of noise in the data. In this paper, we present a novel DR technique known as consensus embedding that aims to overcome these problems by generating and combining multiple low-dimensional embeddings, hence exploiting the variance among them in a manner similar to ensemble classifier schemes such as Bagging. We demonstrate theoretical properties of consensus embedding which show that it will result in a single stable embedding solution that preserves information more accurately as compared to any individual embedding (generated via DR schemes such as Principal Component Analysis, Graph Embedding, or Locally Linear Embedding). Intelligent sub-sampling (via mean-shift) and code parallelization are utilized to provide for an efficient implementation of the scheme. Results Applications of consensus embedding are shown in the context of classification and clustering as applied to: (1) image partitioning of white matter and gray matter on 10 different synthetic brain MRI images corrupted with 18 different combinations of noise and bias field inhomogeneity, (2) classification of 4 high-dimensional gene-expression datasets, (3) cancer detection (at a pixel-level) on 16 image slices obtained from 2 different high-resolution prostate MRI datasets. In over 200 different experiments concerning classification and segmentation of biomedical data, consensus embedding was found to consistently outperform both linear and non-linear DR methods within all applications considered. Conclusions We have presented a novel framework termed consensus embedding which leverages ensemble classification theory within dimensionality reduction, allowing for application to a wide range of high-dimensional biomedical data classification and segmentation problems. Our generalizable framework allows for improved representation and classification in the context of both imaging and non-imaging data. The algorithm offers a promising solution to problems that currently plague DR methods, and may allow for extension to other areas of biomedical data analysis. PMID:22316103
Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets
NASA Astrophysics Data System (ADS)
Tonkova, V.; Paulus, D.; Neeb, H.
2013-02-01
We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.
NASA Astrophysics Data System (ADS)
Liu, Tingguang; Xia, Shuang; Bai, Qin; Zhou, Bangxin; Zhang, Lefu; Lu, Yonghao; Shoji, Tetsuo
2018-01-01
The intergranular cracks and grain boundary (GB) network of a GB-engineered 316 stainless steel after stress corrosion cracking (SCC) test in high temperature high pressure water of reactor environment were investigated by two-dimensional and three-dimensional (3D) characterization in order to expose the mechanism that GB-engineering mitigates intergranular SCC. The 3D microstructure shown that the essential characteristic of the GB-engineered microstructure is formation of many large twin-boundaries as a result of multiple-twinning, which results in the formation of large grain-clusters. The large grain-clusters played a key role to the improvement of intergranular SCC resistance by GB-engineering. The main intergranular cracks propagated in a zigzag along the outer boundaries of these large grain-clusters because all inner boundaries of the grain-clusters were twin-boundaries (∑3) or twin-related boundaries (∑3n) which had much lower susceptibility to SCC than random boundaries. These large grain-clusters had tree-ring-shaped topology structure and very complex morphology. They got tangled so that difficult to be separated during SCC, resulting in some large crack-bridges retained in the crack surface.
Local matrix learning in clustering and applications for manifold visualization.
Arnonkijpanich, Banchar; Hasenfuss, Alexander; Hammer, Barbara
2010-05-01
Electronic data sets are increasing rapidly with respect to both, size of the data sets and data resolution, i.e. dimensionality, such that adequate data inspection and data visualization have become central issues of data mining. In this article, we present an extension of classical clustering schemes by local matrix adaptation, which allows a better representation of data by means of clusters with an arbitrary spherical shape. Unlike previous proposals, the method is derived from a global cost function. The focus of this article is to demonstrate the applicability of this matrix clustering scheme to low-dimensional data embedding for data inspection. The proposed method is based on matrix learning for neural gas and manifold charting. This provides an explicit mapping of a given high-dimensional data space to low dimensionality. We demonstrate the usefulness of this method for data inspection and manifold visualization. 2009 Elsevier Ltd. All rights reserved.
Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin
2015-12-01
One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .
Interlaced coarse-graining for the dynamical cluster approximation
NASA Astrophysics Data System (ADS)
Haehner, Urs; Staar, Peter; Jiang, Mi; Maier, Thomas; Schulthess, Thomas
The negative sign problem remains a challenging limiting factor in quantum Monte Carlo simulations of strongly correlated fermionic many-body systems. The dynamical cluster approximation (DCA) makes this problem less severe by coarse-graining the momentum space to map the bulk lattice to a cluster embedded in a dynamical mean-field host. Here, we introduce a new form of an interlaced coarse-graining and compare it with the traditional coarse-graining. We show that it leads to more controlled results with weaker cluster shape and smoother cluster size dependence, which with increasing cluster size converge to the results obtained using the standard coarse-graining. In addition, the new coarse-graining reduces the severity of the fermionic sign problem. Therefore, it enables calculations on much larger clusters and can allow the evaluation of the exact infinite cluster size result via finite size scaling. To demonstrate this, we study the hole-doped two-dimensional Hubbard model and show that the interlaced coarse-graining in combination with the DCA+ algorithm permits the determination of the superconducting Tc on cluster sizes, for which the results can be fitted with the Kosterlitz-Thouless scaling law. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) awarded by the INCITE program, and of the Swiss National Supercomputing Center. OLCF is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Convex clustering: an attractive alternative to hierarchical clustering.
Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth
2015-05-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox
NASA Astrophysics Data System (ADS)
Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
NASA Technical Reports Server (NTRS)
Srivastava, Ashok, N.; Akella, Ram; Diev, Vesselin; Kumaresan, Sakthi Preethi; McIntosh, Dawn M.; Pontikakis, Emmanuel D.; Xu, Zuobing; Zhang, Yi
2006-01-01
This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0-
Exponents of non-linear clustering in scale-free one-dimensional cosmological simulations
NASA Astrophysics Data System (ADS)
Benhaiem, David; Joyce, Michael; Sicard, François
2013-03-01
One-dimensional versions of dissipationless cosmological N-body simulations have been shown to share many qualitative behaviours of the three-dimensional problem. Their interest lies in the fact that they can resolve a much greater range of time and length scales, and admit exact numerical integration. We use such models here to study how non-linear clustering depends on initial conditions and cosmology. More specifically, we consider a family of models which, like the three-dimensional Einstein-de Sitter (EdS) model, lead for power-law initial conditions to self-similar clustering characterized in the strongly non-linear regime by power-law behaviour of the two-point correlation function. We study how the corresponding exponent γ depends on the initial conditions, characterized by the exponent n of the power spectrum of initial fluctuations, and on a single parameter κ controlling the rate of expansion. The space of initial conditions/cosmology divides very clearly into two parts: (1) a region in which γ depends strongly on both n and κ and where it agrees very well with a simple generalization of the so-called stable clustering hypothesis in three dimensions; and (2) a region in which γ is more or less independent of both the spectrum and the expansion of the universe. The boundary in (n, κ) space dividing the `stable clustering' region from the `universal' region is very well approximated by a `critical' value of the predicted stable clustering exponent itself. We explain how this division of the (n, κ) space can be understood as a simple physical criterion which might indeed be expected to control the validity of the stable clustering hypothesis. We compare and contrast our findings to results in three dimensions, and discuss in particular the light they may throw on the question of `universality' of non-linear clustering in this context.
Cooling and clusters: when is heating needed?
Bryan, Greg; Voit, Mark
2005-03-15
There are (at least) two unsolved problems concerning the current state of the ther- mal gas in clusters of galaxies. The first is to identify the source of the heating which onsets cooling in the centres of clusters with short cooling times (the 'cooling-flow' problem). The second to understand the mechanism which boosts the entropy in cluster and group gas. Since both of these problems involve an unknown source of heating it is tempting to identify them with the same process, particularly since active galactic nuclei heating is observed to be operating at some level in a sample of well-observed 'cooling-flow' clusters. Here we show, using numerical simulations of cluster formation, that much of the gas ending up in clusters cools at high redshift and so the heating is also needed at high redshift, well before the cluster forms. This indicates that the same process operating to solve the cooling-flow problem may not also resolve the cluster-entropy problem.
Statistical Exploration of Electronic Structure of Molecules from Quantum Monte-Carlo Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prabhat, Mr; Zubarev, Dmitry; Lester, Jr., William A.
In this report, we present results from analysis of Quantum Monte Carlo (QMC) simulation data with the goal of determining internal structure of a 3N-dimensional phase space of an N-electron molecule. We are interested in mining the simulation data for patterns that might be indicative of the bond rearrangement as molecules change electronic states. We examined simulation output that tracks the positions of two coupled electrons in the singlet and triplet states of an H2 molecule. The electrons trace out a trajectory, which was analyzed with a number of statistical techniques. This project was intended to address the following scientificmore » questions: (1) Do high-dimensional phase spaces characterizing electronic structure of molecules tend to cluster in any natural way? Do we see a change in clustering patterns as we explore different electronic states of the same molecule? (2) Since it is hard to understand the high-dimensional space of trajectories, can we project these trajectories to a lower dimensional subspace to gain a better understanding of patterns? (3) Do trajectories inherently lie in a lower-dimensional manifold? Can we recover that manifold? After extensive statistical analysis, we are now in a better position to respond to these questions. (1) We definitely see clustering patterns, and differences between the H2 and H2tri datasets. These are revealed by the pamk method in a fairly reliable manner and can potentially be used to distinguish bonded and non-bonded systems and get insight into the nature of bonding. (2) Projecting to a lower dimensional subspace ({approx}4-5) using PCA or Kernel PCA reveals interesting patterns in the distribution of scalar values, which can be related to the existing descriptors of electronic structure of molecules. Also, these results can be immediately used to develop robust tools for analysis of noisy data obtained during QMC simulations (3) All dimensionality reduction and estimation techniques that we tried seem to indicate that one needs 4 or 5 components to account for most of the variance in the data, hence this 5D dataset does not necessarily lie on a well-defined, low dimensional manifold. In terms of specific clustering techniques, K-means was generally useful in exploring the dataset. The partition around medoids (pam) technique produced the most definitive results for our data showing distinctive patterns for both a sample of the complete data and time-series. The gap statistic with tibshirani criteria did not provide any distinction across the 2 dataset. The gap statistic w/DandF criteria, Model based clustering and hierarchical modeling simply failed to run on our datasets. Thankfully, the vanilla PCA technique was successful in handling our entire dataset. PCA revealed some interesting patterns for the scalar value distribution. Kernel PCA techniques (vanilladot, RBF, Polynomial) and MDS failed to run on the entire dataset, or even a significant fraction of the dataset, and we resorted to creating an explicit feature map followed by conventional PCA. Clustering using K-means and PAM in the new basis set seems to produce promising results. Understanding the new basis set in the scientific context of the problem is challenging, and we are currently working to further examine and interpret the results.« less
Coronal Mass Ejection Data Clustering and Visualization of Decision Trees
NASA Astrophysics Data System (ADS)
Ma, Ruizhe; Angryk, Rafal A.; Riley, Pete; Filali Boubrahimi, Soukaina
2018-05-01
Coronal mass ejections (CMEs) can be categorized as either “magnetic clouds” (MCs) or non-MCs. Features such as a large magnetic field, low plasma-beta, and low proton temperature suggest that a CME event is also an MC event; however, so far there is neither a definitive method nor an automatic process to distinguish the two. Human labeling is time-consuming, and results can fluctuate owing to the imprecise definition of such events. In this study, we approach the problem of MC and non-MC distinction from a time series data analysis perspective and show how clustering can shed some light on this problem. Although many algorithms exist for traditional data clustering in the Euclidean space, they are not well suited for time series data. Problems such as inadequate distance measure, inaccurate cluster center description, and lack of intuitive cluster representations need to be addressed for effective time series clustering. Our data analysis in this work is twofold: clustering and visualization. For clustering we compared the results from the popular hierarchical agglomerative clustering technique to a distance density clustering heuristic we developed previously for time series data clustering. In both cases, dynamic time warping will be used for similarity measure. For classification as well as visualization, we use decision trees to aggregate single-dimensional clustering results to form a multidimensional time series decision tree, with averaged time series to present each decision. In this study, we achieved modest accuracy and, more importantly, an intuitive interpretation of how different parameters contribute to an MC event.
Conjugate-gradient optimization method for orbital-free density functional calculations.
Jiang, Hong; Yang, Weitao
2004-08-01
Orbital-free density functional theory as an extension of traditional Thomas-Fermi theory has attracted a lot of interest in the past decade because of developments in both more accurate kinetic energy functionals and highly efficient numerical methodology. In this paper, we developed a conjugate-gradient method for the numerical solution of spin-dependent extended Thomas-Fermi equation by incorporating techniques previously used in Kohn-Sham calculations. The key ingredient of the method is an approximate line-search scheme and a collective treatment of two spin densities in the case of spin-dependent extended Thomas-Fermi problem. Test calculations for a quartic two-dimensional quantum dot system and a three-dimensional sodium cluster Na216 with a local pseudopotential demonstrate that the method is accurate and efficient. (c) 2004 American Institute of Physics.
Restricted random search method based on taboo search in the multiple minima problem
NASA Astrophysics Data System (ADS)
Hong, Seung Do; Jhon, Mu Shik
1997-03-01
The restricted random search method is proposed as a simple Monte Carlo sampling method to search minima fast in the multiple minima problem. This method is based on taboo search applied recently to continuous test functions. The concept of the taboo region instead of the taboo list is used and therefore the sampling of a region near an old configuration is restricted in this method. This method is applied to 2-dimensional test functions and the argon clusters. This method is found to be a practical and efficient method to search near-global configurations of test functions and the argon clusters.
An ensemble framework for clustering protein-protein interaction networks.
Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan
2007-07-01
Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.
Computational Science in Armenia (Invited Talk)
NASA Astrophysics Data System (ADS)
Marandjian, H.; Shoukourian, Yu.
This survey is devoted to the development of informatics and computer science in Armenia. The results in theoretical computer science (algebraic models, solutions to systems of general form recursive equations, the methods of coding theory, pattern recognition and image processing), constitute the theoretical basis for developing problem-solving-oriented environments. As examples can be mentioned: a synthesizer of optimized distributed recursive programs, software tools for cluster-oriented implementations of two-dimensional cellular automata, a grid-aware web interface with advanced service trading for linear algebra calculations. In the direction of solving scientific problems that require high-performance computing resources, examples of completed projects include the field of physics (parallel computing of complex quantum systems), astrophysics (Armenian virtual laboratory), biology (molecular dynamics study of human red blood cell membrane), meteorology (implementing and evaluating the Weather Research and Forecast Model for the territory of Armenia). The overview also notes that the Institute for Informatics and Automation Problems of the National Academy of Sciences of Armenia has established a scientific and educational infrastructure, uniting computing clusters of scientific and educational institutions of the country and provides the scientific community with access to local and international computational resources, that is a strong support for computational science in Armenia.
Assessment of Schrodinger Eigenmaps for target detection
NASA Astrophysics Data System (ADS)
Dorado Munoz, Leidy P.; Messinger, David W.; Czaja, Wojtek
2014-06-01
Non-linear dimensionality reduction methods have been widely applied to hyperspectral imagery due to its structure as the information can be represented in a lower dimension without losing information, and because the non-linear methods preserve the local geometry of the data while the dimension is reduced. One of these methods is Laplacian Eigenmaps (LE), which assumes that the data lies on a low dimensional manifold embedded in a high dimensional space. LE builds a nearest neighbor graph, computes its Laplacian and performs the eigendecomposition of the Laplacian. These eigenfunctions constitute a basis for the lower dimensional space in which the geometry of the manifold is preserved. In addition to the reduction problem, LE has been widely used in tasks such as segmentation, clustering, and classification. In this regard, a new Schrodinger Eigenmaps (SE) method was developed and presented as a semi-supervised classification scheme in order to improve the classification performance and take advantage of the labeled data. SE is an algorithm built upon LE, where the former Laplacian operator is replaced by the Schrodinger operator. The Schrodinger operator includes a potential term V, that, taking advantage of the additional information such as labeled data, allows clustering of similar points. In this paper, we explore the idea of using SE in target detection. In this way, we present a framework where the potential term V is defined as a barrier potential: a diagonal matrix encoding the spatial position of the target, and the detection performance is evaluated by using different targets and different hyperspectral scenes.
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure
2018-01-01
Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257
Wang, Yang; Wu, Lin
2018-07-01
Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the following: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority, especially over recent state-of-the-art LRR models. Copyright © 2018 Elsevier Ltd. All rights reserved.
Image Recommendation Algorithm Using Feature-Based Collaborative Filtering
NASA Astrophysics Data System (ADS)
Kim, Deok-Hwan
As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.
Multi-target detection and positioning in crowds using multiple camera surveillance
NASA Astrophysics Data System (ADS)
Huang, Jiahu; Zhu, Qiuyu; Xing, Yufeng
2018-04-01
In this study, we propose a pixel correspondence algorithm for positioning in crowds based on constraints on the distance between lines of sight, grayscale differences, and height in a world coordinates system. First, a Gaussian mixture model is used to obtain the background and foreground from multi-camera videos. Second, the hair and skin regions are extracted as regions of interest. Finally, the correspondences between each pixel in the region of interest are found under multiple constraints and the targets are positioned by pixel clustering. The algorithm can provide appropriate redundancy information for each target, which decreases the risk of losing targets due to a large viewing angle and wide baseline. To address the correspondence problem for multiple pixels, we construct a pixel-based correspondence model based on a similar permutation matrix, which converts the correspondence problem into a linear programming problem where a similar permutation matrix is found by minimizing an objective function. The correct pixel correspondences can be obtained by determining the optimal solution of this linear programming problem and the three-dimensional position of the targets can also be obtained by pixel clustering. Finally, we verified the algorithm with multiple cameras in experiments, which showed that the algorithm has high accuracy and robustness.
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Semi-Supervised Clustering for High-Dimensional and Sparse Features
ERIC Educational Resources Information Center
Yan, Su
2010-01-01
Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
Dazard, Jean-Eudes; Rao, J Sunil
2012-07-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data
Dazard, Jean-Eudes; Rao, J. Sunil
2012-01-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950
Zhang, Yong-Tao; Shi, Jing; Shu, Chi-Wang; Zhou, Ye
2003-10-01
A quantitative study is carried out in this paper to investigate the size of numerical viscosities and the resolution power of high-order weighted essentially nonoscillatory (WENO) schemes for solving one- and two-dimensional Navier-Stokes equations for compressible gas dynamics with high Reynolds numbers. A one-dimensional shock tube problem, a one-dimensional example with parameters motivated by supernova and laser experiments, and a two-dimensional Rayleigh-Taylor instability problem are used as numerical test problems. For the two-dimensional Rayleigh-Taylor instability problem, or similar problems with small-scale structures, the details of the small structures are determined by the physical viscosity (therefore, the Reynolds number) in the Navier-Stokes equations. Thus, to obtain faithful resolution to these small-scale structures, the numerical viscosity inherent in the scheme must be small enough so that the physical viscosity dominates. A careful mesh refinement study is performed to capture the threshold mesh for full resolution, for specific Reynolds numbers, when WENO schemes of different orders of accuracy are used. It is demonstrated that high-order WENO schemes are more CPU time efficient to reach the same resolution, both for the one-dimensional and two-dimensional test problems.
Chen, Yingyi; Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang
2018-01-01
A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies.
Application of diffusion maps to identify human factors of self-reported anomalies in aviation.
Andrzejczak, Chris; Karwowski, Waldemar; Mikusinski, Piotr
2012-01-01
A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. Diffusion Maps (DM) were selected as the method of choice for performing dimensionality reduction on text records for this study. Diffusion Maps have seen successful use in other domains such as image classification and pattern recognition. High-dimensionality data in the form of narrative text reports from the NASA Aviation Safety Reporting System (ASRS) were clustered and categorized by way of dimensionality reduction. Supervised analyses were performed to create a baseline document clustering system. Dimensionality reduction techniques identified concepts or keywords within records, and allowed the creation of a framework for an unsupervised document classification system. Results from the unsupervised clustering algorithm performed similarly to the supervised methods outlined in the study. The dimensionality reduction was performed on 100 of the most commonly occurring words within 126,000 text records describing commercial aviation incidents. This study demonstrates that unsupervised machine clustering and organization of incident reports is possible based on unbiased inputs. Findings from this study reinforced traditional views on what factors contribute to civil aviation anomalies, however, new associations between previously unrelated factors and conditions were also found.
Interval data clustering using self-organizing maps based on adaptive Mahalanobis distances.
Hajjar, Chantal; Hamdan, Hani
2013-10-01
The self-organizing map is a kind of artificial neural network used to map high dimensional data into a low dimensional space. This paper presents a self-organizing map for interval-valued data based on adaptive Mahalanobis distances in order to do clustering of interval data with topology preservation. Two methods based on the batch training algorithm for the self-organizing maps are proposed. The first method uses a common Mahalanobis distance for all clusters. In the second method, the algorithm starts with a common Mahalanobis distance per cluster and then switches to use a different distance per cluster. This process allows a more adapted clustering for the given data set. The performances of the proposed methods are compared and discussed using artificial and real interval data sets. Copyright © 2013 Elsevier Ltd. All rights reserved.
Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun
2017-12-01
Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Clustering methods for the optimization of atomic cluster structure
NASA Astrophysics Data System (ADS)
Bagattini, Francesco; Schoen, Fabio; Tigli, Luca
2018-04-01
In this paper, we propose a revised global optimization method and apply it to large scale cluster conformation problems. In the 1990s, the so-called clustering methods were considered among the most efficient general purpose global optimization techniques; however, their usage has quickly declined in recent years, mainly due to the inherent difficulties of clustering approaches in large dimensional spaces. Inspired from the machine learning literature, we redesigned clustering methods in order to deal with molecular structures in a reduced feature space. Our aim is to show that by suitably choosing a good set of geometrical features coupled with a very efficient descent method, an effective optimization tool is obtained which is capable of finding, with a very high success rate, all known putative optima for medium size clusters without any prior information, both for Lennard-Jones and Morse potentials. The main result is that, beyond being a reliable approach, the proposed method, based on the idea of starting a computationally expensive deep local search only when it seems worth doing so, is capable of saving a huge amount of searches with respect to an analogous algorithm which does not employ a clustering phase. In this paper, we are not claiming the superiority of the proposed method compared to specific, refined, state-of-the-art procedures, but rather indicating a quite straightforward way to save local searches by means of a clustering scheme working in a reduced variable space, which might prove useful when included in many modern methods.
Big Data Analytics for Demand Response: Clustering Over Space and Time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chelmis, Charalampos; Kolte, Jahanvi; Prasanna, Viktor K.
The pervasive deployment of advanced sensing infrastructure in Cyber-Physical systems, such as the Smart Grid, has resulted in an unprecedented data explosion. Such data exhibit both large volumes and high velocity characteristics, two of the three pillars of Big Data, and have a time-series notion as datasets in this context typically consist of successive measurements made over a time interval. Time-series data can be valuable for data mining and analytics tasks such as identifying the “right” customers among a diverse population, to target for Demand Response programs. However, time series are challenging to mine due to their high dimensionality. Inmore » this paper, we motivate this problem using a real application from the smart grid domain. We explore novel representations of time-series data for BigData analytics, and propose a clustering technique for determining natural segmentation of customers and identification of temporal consumption patterns. Our method is generizable to large-scale, real-world scenarios, without making any assumptions about the data. We evaluate our technique using real datasets from smart meters, totaling ~ 18,200,000 data points, and show the efficacy of our technique in efficiency detecting the number of optimal number of clusters.« less
How to cluster in parallel with neural networks
NASA Technical Reports Server (NTRS)
Kamgar-Parsi, Behzad; Gualtieri, J. A.; Devaney, Judy E.; Kamgar-Parsi, Behrooz
1988-01-01
Partitioning a set of N patterns in a d-dimensional metric space into K clusters - in a way that those in a given cluster are more similar to each other than the rest - is a problem of interest in astrophysics, image analysis and other fields. As there are approximately K(N)/K (factorial) possible ways of partitioning the patterns among K clusters, finding the best solution is beyond exhaustive search when N is large. Researchers show that this problem can be formulated as an optimization problem for which very good, but not necessarily optimal solutions can be found by using a neural network. To do this the network must start from many randomly selected initial states. The network is simulated on the MPP (a 128 x 128 SIMD array machine), where researchers use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also by starting the network from many initial states at once, thus obtaining many solutions in one run. Researchers obtain speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of speedups comensurate with human perceptual abilities.
A novel framework to alleviate the sparsity problem in context-aware recommender systems
NASA Astrophysics Data System (ADS)
Yu, Penghua; Lin, Lanfen; Wang, Jing
2017-04-01
Recommender systems have become indispensable for services in the era of big data. To improve accuracy and satisfaction, context-aware recommender systems (CARSs) attempt to incorporate contextual information into recommendations. Typically, valid and influential contexts are determined in advance by domain experts or feature selection approaches. Most studies have focused on utilizing the unitary context due to the differences between various contexts. Meanwhile, multi-dimensional contexts will aggravate the sparsity problem, which means that the user preference matrix would become extremely sparse. Consequently, there are not enough or even no preferences in most multi-dimensional conditions. In this paper, we propose a novel framework to alleviate the sparsity issue for CARSs, especially when multi-dimensional contextual variables are adopted. Motivated by the intuition that the overall preferences tend to show similarities among specific groups of users and conditions, we first explore to construct one contextual profile for each contextual condition. In order to further identify those user and context subgroups automatically and simultaneously, we apply a co-clustering algorithm. Furthermore, we expand user preferences in a given contextual condition with the identified user and context clusters. Finally, we perform recommendations based on expanded preferences. Extensive experiments demonstrate the effectiveness of the proposed framework.
The Use of Signal Dimensionality for Automatic QC of Seismic Array Data
NASA Astrophysics Data System (ADS)
Rowe, C. A.; Stead, R. J.; Begnaud, M. L.; Draganov, D.; Maceira, M.; Gomez, M.
2014-12-01
A significant problem in seismic array analysis is the inclusion of bad sensor channels in the beam-forming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by-node basis, so the dimensionality of the node traffic is instead monitored for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. We examine the signal dimension in similar way to the method addressing node traffic anomalies in large computer systems. We explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements. We show preliminary results applied to arrays in Kazakhstan (Makanchi) and Argentina (Malargue).
Constrained-transport Magnetohydrodynamics with Adaptive Mesh Refinement in CHARM
NASA Astrophysics Data System (ADS)
Miniati, Francesco; Martin, Daniel F.
2011-07-01
We present the implementation of a three-dimensional, second-order accurate Godunov-type algorithm for magnetohydrodynamics (MHD) in the adaptive-mesh-refinement (AMR) cosmological code CHARM. The algorithm is based on the full 12-solve spatially unsplit corner-transport-upwind (CTU) scheme. The fluid quantities are cell-centered and are updated using the piecewise-parabolic method (PPM), while the magnetic field variables are face-centered and are evolved through application of the Stokes theorem on cell edges via a constrained-transport (CT) method. The so-called multidimensional MHD source terms required in the predictor step for high-order accuracy are applied in a simplified form which reduces their complexity in three dimensions without loss of accuracy or robustness. The algorithm is implemented on an AMR framework which requires specific synchronization steps across refinement levels. These include face-centered restriction and prolongation operations and a reflux-curl operation, which maintains a solenoidal magnetic field across refinement boundaries. The code is tested against a large suite of test problems, including convergence tests in smooth flows, shock-tube tests, classical two- and three-dimensional MHD tests, a three-dimensional shock-cloud interaction problem, and the formation of a cluster of galaxies in a fully cosmological context. The magnetic field divergence is shown to remain negligible throughout.
First assembly times and equilibration in stochastic coagulation-fragmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
D’Orsogna, Maria R.; Department of Mathematics, CSUN, Los Angeles, California 91330-8313; Lei, Qi
2015-07-07
We develop a fully stochastic theory for coagulation and fragmentation (CF) in a finite system with a maximum cluster size constraint. The process is modeled using a high-dimensional master equation for the probabilities of cluster configurations. For certain realizations of total mass and maximum cluster sizes, we find exact analytical results for the expected equilibrium cluster distributions. If coagulation is fast relative to fragmentation and if the total system mass is indivisible by the mass of the largest allowed cluster, we find a mean cluster-size distribution that is strikingly broader than that predicted by the corresponding mass-action equations. Combinations ofmore » total mass and maximum cluster size under which equilibration is accelerated, eluding late-stage coarsening, are also delineated. Finally, we compute the mean time it takes particles to first assemble into a maximum-sized cluster. Through careful state-space enumeration, the scaling of mean assembly times is derived for all combinations of total mass and maximum cluster size. We find that CF accelerates assembly relative to monomer kinetic only in special cases. All of our results hold in the infinite system limit and can be only derived from a high-dimensional discrete stochastic model, highlighting how classical mass-action models of self-assembly can fail.« less
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms
NASA Astrophysics Data System (ADS)
Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut
1998-06-01
X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.
Synthesis of borophenes: Anisotropic, two-dimensional boron polymorphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mannix, A. J.; Zhou, X. -F.; Kiraly, B.
At the atomic-cluster scale, pure boron is markedly similar to carbon, forming simple planar molecules and cage-like fullerenes. Theoretical studies predict that two-dimensional (2D) boron sheets will adopt an atomic configuration similar to that of boron atomic clusters. We synthesized atomically thin, crystalline 2D boron sheets (i.e., borophene) on silver surfaces under ultrahigh-vacuum conditions. Atomic-scale characterization, supported by theoretical calculations, revealed structures reminiscent of fused boron clusters with multiple scales of anisotropic, out-of-plane buckling. Unlike bulk boron allotropes, borophene shows metallic characteristics that are consistent with predictions of a highly anisotropic, 2D metal.
Yu, Huihui; Cheng, Yanjun; Cheng, Qianqian; Li, Daoliang
2018-01-01
A precise predictive model is important for obtaining a clear understanding of the changes in dissolved oxygen content in crab ponds. Highly accurate interval forecasting of dissolved oxygen content is fundamental to reduce risk, and three-dimensional prediction can provide more accurate results and overall guidance. In this study, a hybrid three-dimensional (3D) dissolved oxygen content prediction model based on a radial basis function (RBF) neural network, K-means and subtractive clustering was developed and named the subtractive clustering (SC)-K-means-RBF model. In this modeling process, K-means and subtractive clustering methods were employed to enhance the hyperparameters required in the RBF neural network model. The comparison of the predicted results of different traditional models validated the effectiveness and accuracy of the proposed hybrid SC-K-means-RBF model for three-dimensional prediction of dissolved oxygen content. Consequently, the proposed model can effectively display the three-dimensional distribution of dissolved oxygen content and serve as a guide for feeding and future studies. PMID:29466394
Kim, Hyunsoo; Park, Haesun
2007-06-15
Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
NASA Technical Reports Server (NTRS)
Liou, Meng-Sing
1992-01-01
A unique formulation of describing fluid motion is presented. The method, referred to as 'extended Lagrangian method', is interesting from both theoretical and numerical points of view. The formulation offers accuracy in numerical solution by avoiding numerical diffusion resulting from mixing of fluxes in the Eulerian description. Meanwhile, it also avoids the inaccuracy incurred due to geometry and variable interpolations used by the previous Lagrangian methods. Unlike the Lagrangian method previously imposed which is valid only for supersonic flows, the present method is general and capable of treating subsonic flows as well as supersonic flows. The method proposed in this paper is robust and stable. It automatically adapts to flow features without resorting to clustering, thereby maintaining rather uniform grid spacing throughout and large time step. Moreover, the method is shown to resolve multi-dimensional discontinuities with a high level of accuracy, similar to that found in one-dimensional problems.
Progeny Clustering: A Method to Identify Biological Phenotypes
Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.
2015-01-01
Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, Tzu-Chieh; C. N. Yang Institute for Theoretical Physics, State University of New York at Stony Brook, Stony Brook, New York 11794-3840; Raussendorf, Robert
2011-10-15
Universal quantum computation can be achieved by simply performing single-qubit measurements on a highly entangled resource state, such as cluster states. Cai, Miyake, Duer, and Briegel recently constructed a ground state of a two-dimensional quantum magnet by combining multiple Affleck-Kennedy-Lieb-Tasaki quasichains of mixed spin-3/2 and spin-1/2 entities and by mapping pairs of neighboring spin-1/2 particles to individual spin-3/2 particles [Phys. Rev. A 82, 052309 (2010)]. They showed that this state enables universal quantum computation by single-spin measurements. Here, we give an alternative understanding of how this state gives rise to universal measurement-based quantum computation: by local operations, each quasichain canmore » be converted to a one-dimensional cluster state and entangling gates between two neighboring logical qubits can be implemented by single-spin measurements. We further argue that a two-dimensional cluster state can be distilled from the Cai-Miyake-Duer-Briegel state.« less
NASA Astrophysics Data System (ADS)
Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2008-11-01
Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.
Discovering biclusters in gene expression data based on high-dimensional linear geometries
Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong
2008-01-01
Background In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. Results In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. Conclusion We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis. PMID:18433477
A Study on Regional Frequency Analysis using Artificial Neural Network - the Sumjin River Basin
NASA Astrophysics Data System (ADS)
Jeong, C.; Ahn, J.; Ahn, H.; Heo, J. H.
2017-12-01
Regional frequency analysis means to make up for shortcomings in the at-site frequency analysis which is about a lack of sample size through the regional concept. Regional rainfall quantile depends on the identification of hydrologically homogeneous regions, hence the regional classification based on hydrological homogeneous assumption is very important. For regional clustering about rainfall, multidimensional variables and factors related geographical features and meteorological figure are considered such as mean annual precipitation, number of days with precipitation in a year and average maximum daily precipitation in a month. Self-Organizing Feature Map method which is one of the artificial neural network algorithm in the unsupervised learning techniques solves N-dimensional and nonlinear problems and be shown results simply as a data visualization technique. In this study, for the Sumjin river basin in South Korea, cluster analysis was performed based on SOM method using high-dimensional geographical features and meteorological factor as input data. then, for the results, in order to evaluate the homogeneity of regions, the L-moment based discordancy and heterogeneity measures were used. Rainfall quantiles were estimated as the index flood method which is one of regional rainfall frequency analysis. Clustering analysis using SOM method and the consequential variation in rainfall quantile were analyzed. This research was supported by a grant(2017-MPSS31-001) from Supporting Technology Development Program for Disaster Management funded by Ministry of Public Safety and Security(MPSS) of the Korean government.
Dazard, Jean-Eudes; Rao, J. Sunil
2010-01-01
The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-specifying their total number. A number of related methods already exist, yet are challenged in the context of high dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive non-parametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer micro-array dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online. PMID:22399839
2DRMP: A suite of two-dimensional R-matrix propagation codes
NASA Astrophysics Data System (ADS)
Scott, N. S.; Scott, M. P.; Burke, P. G.; Stitt, T.; Faro-Maza, V.; Denis, C.; Maniopoulou, A.
2009-12-01
The R-matrix method has proved to be a remarkably stable, robust and efficient technique for solving the close-coupling equations that arise in electron and photon collisions with atoms, ions and molecules. During the last thirty-four years a series of related R-matrix program packages have been published periodically in CPC. These packages are primarily concerned with low-energy scattering where the incident energy is insufficient to ionise the target. In this paper we describe 2DRMP, a suite of two-dimensional R-matrix propagation programs aimed at creating virtual experiments on high performance and grid architectures to enable the study of electron scattering from H-like atoms and ions at intermediate energies. Program summaryProgram title: 2DRMP Catalogue identifier: AEEA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 196 717 No. of bytes in distributed program, including test data, etc.: 3 819 727 Distribution format: tar.gz Programming language: Fortran 95, MPI Computer: Tested on CRAY XT4 [1]; IBM eServer 575 [2]; Itanium II cluster [3] Operating system: Tested on UNICOS/lc [1]; IBM AIX [2]; Red Hat Linux Enterprise AS [3] Has the code been vectorised or parallelised?: Yes. 16 cores were used for small test run Classification: 2.4 External routines: BLAS, LAPACK, PBLAS, ScaLAPACK Subprograms used: ADAZ_v1_1 Nature of problem: 2DRMP is a suite of programs aimed at creating virtual experiments on high performance architectures to enable the study of electron scattering from H-like atoms and ions at intermediate energies. Solution method: Two-dimensional R-matrix propagation theory. The (r,r) space of the internal region is subdivided into a number of subregions. Local R-matrices are constructed within each subregion and used to propagate a global R-matrix, ℜ, across the internal region. On the boundary of the internal region ℜ is transformed onto the IERM target state basis. Thus, the two-dimensional R-matrix propagation technique transforms an intractable problem into a series of tractable problems enabling the internal region to be extended far beyond that which is possible with the standard one-sector codes. A distinctive feature of the method is that both electrons are treated identically and the R-matrix basis states are constructed to allow for both electrons to be in the continuum. The subregion size is flexible and can be adjusted to accommodate the number of cores available. Restrictions: The implementation is currently restricted to electron scattering from H-like atoms and ions. Additional comments: The programs have been designed to operate on serial computers and to exploit the distributed memory parallelism found on tightly coupled high performance clusters and supercomputers. 2DRMP has been systematically and comprehensively documented using ROBODoc [4] which is an API documentation tool that works by extracting specially formatted headers from the program source code and writing them to documentation files. Running time: The wall clock running time for the small test run using 16 cores and performed on [3] is as follows: bp (7 s); rint2 (34 s); newrd (32 s); diag (21 s); amps (11 s); prop (24 s). References:HECToR, CRAY XT4 running UNICOS/lc, http://www.hector.ac.uk/, accessed 22 July, 2009. HPCx, IBM eServer 575 running IBM AIX, http://www.hpcx.ac.uk/, accessed 22 July, 2009. HP Cluster, Itanium II cluster running Red Hat Linux Enterprise AS, Queen s University Belfast, http://www.qub.ac.uk/directorates/InformationServices/Research/HighPerformanceComputing/Services/Hardware/HPResearch/, accessed 22 July, 2009. Automating Software Documentation with ROBODoc, http://www.xs4all.nl/~rfsber/Robo/, accessed 22 July, 2009.
Tello, Javier; Cubero, Sergio; Blasco, José; Tardaguila, Javier; Aleixos, Nuria; Ibáñez, Javier
2016-10-01
Grapevine cluster morphology influences the quality and commercial value of wine and table grapes. It is routinely evaluated by subjective and inaccurate methods that do not meet the requirements set by the food industry. Novel two-dimensional (2D) and three-dimensional (3D) machine vision technologies emerge as promising tools for its automatic and fast evaluation. The automatic evaluation of cluster length, width and elongation was successfully achieved by the analysis of 2D images, significant and strong correlations with the manual methods being found (r = 0.959, 0.861 and 0.852, respectively). The classification of clusters according to their shape can be achieved by evaluating their conicity in different sections of the cluster. The geometric reconstruction of the morphological volume of the cluster from 2D features worked better than the direct 3D laser scanning system, showing a high correlation (r = 0.956) with the manual approach (water displacement method). In addition, we constructed and validated a simple linear regression model for cluster compactness estimation. It showed a high predictive capacity for both the training and validation subsets of clusters (R(2) = 84.5 and 71.1%, respectively). The methodologies proposed in this work provide continuous and accurate data for the fast and objective characterisation of cluster morphology. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
NASA Technical Reports Server (NTRS)
Moitra, Anutosh
1989-01-01
A fast and versatile procedure for algebraically generating boundary conforming computational grids for use with finite-volume Euler flow solvers is presented. A semi-analytic homotopic procedure is used to generate the grids. Grids generated in two-dimensional planes are stacked to produce quasi-three-dimensional grid systems. The body surface and outer boundary are described in terms of surface parameters. An interpolation scheme is used to blend between the body surface and the outer boundary in order to determine the field points. The method, albeit developed for analytically generated body geometries is equally applicable to other classes of geometries. The method can be used for both internal and external flow configurations, the only constraint being that the body geometries be specified in two-dimensional cross-sections stationed along the longitudinal axis of the configuration. Techniques for controlling various grid parameters, e.g., clustering and orthogonality are described. Techniques for treating problems arising in algebraic grid generation for geometries with sharp corners are addressed. A set of representative grid systems generated by this method is included. Results of flow computations using these grids are presented for validation of the effectiveness of the method.
NASA Astrophysics Data System (ADS)
Niwase, Hiroaki; Takada, Naoki; Araki, Hiromitsu; Maeda, Yuki; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi
2016-09-01
Parallel calculations of large-pixel-count computer-generated holograms (CGHs) are suitable for multiple-graphics processing unit (multi-GPU) cluster systems. However, it is not easy for a multi-GPU cluster system to accomplish fast CGH calculations when CGH transfers between PCs are required. In these cases, the CGH transfer between the PCs becomes a bottleneck. Usually, this problem occurs only in multi-GPU cluster systems with a single spatial light modulator. To overcome this problem, we propose a simple method using the InfiniBand network. The computational speed of the proposed method using 13 GPUs (NVIDIA GeForce GTX TITAN X) was more than 3000 times faster than that of a CPU (Intel Core i7 4770) when the number of three-dimensional (3-D) object points exceeded 20,480. In practice, we achieved ˜40 tera floating point operations per second (TFLOPS) when the number of 3-D object points exceeded 40,960. Our proposed method was able to reconstruct a real-time movie of a 3-D object comprising 95,949 points.
AMOEBA clustering revisited. [cluster analysis, classification, and image display program
NASA Technical Reports Server (NTRS)
Bryant, Jack
1990-01-01
A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.
NASA Astrophysics Data System (ADS)
Sun, Alexander Y.; Morris, Alan P.; Mohanty, Sitakanta
2009-07-01
Estimated parameter distributions in groundwater models may contain significant uncertainties because of data insufficiency. Therefore, adaptive uncertainty reduction strategies are needed to continuously improve model accuracy by fusing new observations. In recent years, various ensemble Kalman filters have been introduced as viable tools for updating high-dimensional model parameters. However, their usefulness is largely limited by the inherent assumption of Gaussian error statistics. Hydraulic conductivity distributions in alluvial aquifers, for example, are usually non-Gaussian as a result of complex depositional and diagenetic processes. In this study, we combine an ensemble Kalman filter with grid-based localization and a Gaussian mixture model (GMM) clustering techniques for updating high-dimensional, multimodal parameter distributions via dynamic data assimilation. We introduce innovative strategies (e.g., block updating and dimension reduction) to effectively reduce the computational costs associated with these modified ensemble Kalman filter schemes. The developed data assimilation schemes are demonstrated numerically for identifying the multimodal heterogeneous hydraulic conductivity distributions in a binary facies alluvial aquifer. Our results show that localization and GMM clustering are very promising techniques for assimilating high-dimensional, multimodal parameter distributions, and they outperform the corresponding global ensemble Kalman filter analysis scheme in all scenarios considered.
High-Fidelity Real-Time Simulation on Deployed Platforms
2010-08-26
three–dimensional transient heat conduction “ Swiss Cheese ” problem; and a three–dimensional unsteady incompressible Navier- Stokes low–Reynolds–number...our approach with three examples: a two?dimensional Helmholtz acoustics ?horn? problem; a three?dimensional transient heat conduction ? Swiss Cheese ...solutions; a transient lin- ear heat conduction problem in a three–dimensional “ Swiss Cheese ” configuration Ω — to illustrate treat- ment of many
Automated flow cytometric analysis across large numbers of samples and cell types.
Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urrutia, Alejandra; Beitz, Benoît; Rouilly, Vincent; Duffy, Darragh; Patin, Étienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L; Schwikowski, Benno
2015-04-01
Multi-parametric flow cytometry is a key technology for characterization of immune cell phenotypes. However, robust high-dimensional post-analytic strategies for automated data analysis in large numbers of donors are still lacking. Here, we report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM)-based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor permitted automated identification of 24 cell types across four panels. Cluster labels were integrated into FCS files, thus permitting comparisons to manual gating. Cell numbers and coefficient of variation (CV) were similar between FlowGM and conventional gating for lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid high-dimensional analysis of cell phenotypes and is amenable to cohort studies. Copyright © 2015. Published by Elsevier Inc.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2012-05-04
Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
NASA Astrophysics Data System (ADS)
Li, Weixuan; Lin, Guang; Li, Bing
2016-09-01
Many uncertainty quantification (UQ) approaches suffer from the curse of dimensionality, that is, their computational costs become intractable for problems involving a large number of uncertainty parameters. In these situations, the classic Monte Carlo often remains the preferred method of choice because its convergence rate O (n - 1 / 2), where n is the required number of model simulations, does not depend on the dimension of the problem. However, many high-dimensional UQ problems are intrinsically low-dimensional, because the variation of the quantity of interest (QoI) is often caused by only a few latent parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace in the statistics literature. Motivated by this observation, we propose two inverse regression-based UQ algorithms (IRUQ) for high-dimensional problems. Both algorithms use inverse regression to convert the original high-dimensional problem to a low-dimensional one, which is then efficiently solved by building a response surface for the reduced model, for example via the polynomial chaos expansion. The first algorithm, which is for the situations where an exact SDR subspace exists, is proved to converge at rate O (n-1), hence much faster than MC. The second algorithm, which doesn't require an exact SDR, employs the reduced model as a control variate to reduce the error of the MC estimate. The accuracy gain could still be significant, depending on how well the reduced model approximates the original high-dimensional one. IRUQ also provides several additional practical advantages: it is non-intrusive; it does not require computing the high-dimensional gradient of the QoI; and it reports an error bar so the user knows how reliable the result is.
Detection and tracking of gas plumes in LWIR hyperspectral video sequence data
NASA Astrophysics Data System (ADS)
Gerhart, Torin; Sunu, Justin; Lieu, Lauren; Merkurjev, Ekaterina; Chang, Jen-Mei; Gilles, Jérôme; Bertozzi, Andrea L.
2013-05-01
Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we present an effective method of visualizing hyperspectral video sequences containing chemical plumes and investigate the effectiveness of segmentation techniques on these post-processed videos. Our approach uses a combination of dimension reduction and histogram equalization to prepare the hyperspectral videos for segmentation. First, Principal Components Analysis (PCA) is used to reduce the dimension of the entire video sequence. This is done by projecting each pixel onto the first few Principal Components resulting in a type of spectral filter. Next, a Midway method for histogram equalization is used. These methods redistribute the intensity values in order to reduce icker between frames. This properly prepares these high-dimensional video sequences for more traditional segmentation techniques. We compare the ability of various clustering techniques to properly segment the chemical plume. These include K-means, spectral clustering, and the Ginzburg-Landau functional.
Synthesis of borophenes: Anisotropic, two-dimensional boron polymorphs.
Mannix, Andrew J; Zhou, Xiang-Feng; Kiraly, Brian; Wood, Joshua D; Alducin, Diego; Myers, Benjamin D; Liu, Xiaolong; Fisher, Brandon L; Santiago, Ulises; Guest, Jeffrey R; Yacaman, Miguel Jose; Ponce, Arturo; Oganov, Artem R; Hersam, Mark C; Guisinger, Nathan P
2015-12-18
At the atomic-cluster scale, pure boron is markedly similar to carbon, forming simple planar molecules and cage-like fullerenes. Theoretical studies predict that two-dimensional (2D) boron sheets will adopt an atomic configuration similar to that of boron atomic clusters. We synthesized atomically thin, crystalline 2D boron sheets (i.e., borophene) on silver surfaces under ultrahigh-vacuum conditions. Atomic-scale characterization, supported by theoretical calculations, revealed structures reminiscent of fused boron clusters with multiple scales of anisotropic, out-of-plane buckling. Unlike bulk boron allotropes, borophene shows metallic characteristics that are consistent with predictions of a highly anisotropic, 2D metal. Copyright © 2015, American Association for the Advancement of Science.
Motion estimation in the frequency domain using fuzzy c-planes clustering.
Erdem, C E; Karabulut, G Z; Yanmaz, E; Anarim, E
2001-01-01
A recent work explicitly models the discontinuous motion estimation problem in the frequency domain where the motion parameters are estimated using a harmonic retrieval approach. The vertical and horizontal components of the motion are independently estimated from the locations of the peaks of respective periodogram analyses and they are paired to obtain the motion vectors using a procedure proposed. In this paper, we present a more efficient method that replaces the motion component pairing task and hence eliminates the problems of the pairing method described. The method described in this paper uses the fuzzy c-planes (FCP) clustering approach to fit planes to three-dimensional (3-D) frequency domain data obtained from the peaks of the periodograms. Experimental results are provided to demonstrate the effectiveness of the proposed method.
SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering.
Cao, Jie; Wu, Zhiang; Wu, Junjie; Xiong, Hui
2013-04-01
Information-theoretic clustering aims to exploit information-theoretic measures as the clustering criteria. A common practice on this topic is the so-called Info-Kmeans, which performs K-means clustering with KL-divergence as the proximity function. While expert efforts on Info-Kmeans have shown promising results, a remaining challenge is to deal with high-dimensional sparse data such as text corpora. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional text vectors, which leads to infinite KL-divergence values and creates a dilemma in assigning objects to centroids during the iteration process of Info-Kmeans. To meet this challenge, in this paper, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of KL-divergence by the incremental computation of Shannon entropy. This can avoid the zero-feature dilemma caused by the use of KL-divergence. To improve the clustering quality, we further introduce the variable neighborhood search scheme and propose the V-SAIL algorithm, which is then accelerated by a multithreaded scheme in PV-SAIL. Our experimental results on various real-world text collections have shown that, with SAIL as a booster, the clustering performance of Info-Kmeans can be significantly improved. Also, V-SAIL and PV-SAIL indeed help improve the clustering quality at a lower cost of computation.
Ji, Shuiwang
2013-07-11
The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship.
Self-assembled three-dimensional chiral colloidal architecture
NASA Astrophysics Data System (ADS)
Ben Zion, Matan Yah; He, Xiaojin; Maass, Corinna C.; Sha, Ruojie; Seeman, Nadrian C.; Chaikin, Paul M.
2017-11-01
Although stereochemistry has been a central focus of the molecular sciences since Pasteur, its province has previously been restricted to the nanometric scale. We have programmed the self-assembly of micron-sized colloidal clusters with structural information stemming from a nanometric arrangement. This was done by combining DNA nanotechnology with colloidal science. Using the functional flexibility of DNA origami in conjunction with the structural rigidity of colloidal particles, we demonstrate the parallel self-assembly of three-dimensional microconstructs, evincing highly specific geometry that includes control over position, dihedral angles, and cluster chirality.
Lamela, Diogo; Jongenelen, Inês; Morais, Ana; Figueiredo, Bárbara
2017-09-01
Both depressive and somatic symptoms are significant predictors of parenting and coparenting problems. However, despite clear evidence of their co-occurrence, no study to date has examined the association between depressive-somatic symptoms clusters and parenting and coparenting. The current research sought to identify and cross-validate clusters of cognitive-affective depressive symptoms and nonspecific somatic symptoms, as well as to test whether clusters would differ on parenting and coparenting problems across three independent samples of mothers. Participants in Studies 1 and 3 consisted of 409 and 652 community mothers, respectively. Participants in Study 2 consisted of 162 mothers exposed to intimate partner violence. All participants prospectively completed self-report measures of depressive and nonspecific somatic symptoms and parenting (Studies 1 and 2) or coparenting (Study 3). Across studies, three depression-somatic symptoms clusters were identified: no symptoms, high depression and low nonspecific somatic symptoms, and high depression and nonspecific somatic symptoms. The high depression-somatic symptoms cluster was associated with the highest levels of child physical maltreatment risk (Study 1) and overt-conflict coparenting (Study 3). No differences in perceived maternal competence (Study 2) and cooperative and undermining coparenting (Study 3) were found between the high depression and low somatic symptoms cluster and the high depression-somatic symptoms cluster. The results provide novel evidence for the strong associations between clusters of depression and nonspecific somatic symptoms and specific parenting and coparenting problems. Cluster stability across three independent samples suggest that they may be generalizable. The results inform preventive approaches and evidence-based psychotherapeutic treatments. Copyright © 2017 Elsevier B.V. All rights reserved.
A phase cell cluster expansion for Euclidean field theories
NASA Astrophysics Data System (ADS)
Battle, Guy A., III; Federbush, Paul
1982-08-01
We adapt the cluster expansion first used to treat infrared problems for lattice models (a mass zero cluster expansion) to the usual field theory situation. The field is expanded in terms of special block spin functions and the cluster expansion given in terms of the expansion coefficients (phase cell variables); the cluster expansion expresses correlation functions in terms of contributions from finite coupled subsets of these variables. Most of the present work is carried through in d space time dimensions (for φ24 the details of the cluster expansion are pursued and convergence is proven). Thus most of the results in the present work will apply to a treatment of φ34 to which we hope to return in a succeeding paper. Of particular interest in this paper is a substitute for the stability of the vacuum bound appropriate to this cluster expansion (for d = 2 and d = 3), and a new method for performing estimates with tree graphs. The phase cell cluster expansions have the renormalization group incorporated intimately into their structure. We hope they will be useful ultimately in treating four dimensional field theories.
NASA Astrophysics Data System (ADS)
Wei, Tzu-Chieh; Raussendorf, Robert; Kwek, Leong Chuan
2011-10-01
Universal quantum computation can be achieved by simply performing single-qubit measurements on a highly entangled resource state, such as cluster states. Cai, Miyake, Dür, and Briegel recently constructed a ground state of a two-dimensional quantum magnet by combining multiple Affleck-Kennedy-Lieb-Tasaki quasichains of mixed spin-3/2 and spin-1/2 entities and by mapping pairs of neighboring spin-1/2 particles to individual spin-3/2 particles [Phys. Rev. APLRAAN1050-294710.1103/PhysRevA.82.052309 82, 052309 (2010)]. They showed that this state enables universal quantum computation by single-spin measurements. Here, we give an alternative understanding of how this state gives rise to universal measurement-based quantum computation: by local operations, each quasichain can be converted to a one-dimensional cluster state and entangling gates between two neighboring logical qubits can be implemented by single-spin measurements. We further argue that a two-dimensional cluster state can be distilled from the Cai-Miyake-Dür-Briegel state.
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences
NASA Technical Reports Server (NTRS)
Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene
2006-01-01
This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
SLLE for predicting membrane protein types.
Wang, Meng; Yang, Jie; Xu, Zhi-Jie; Chou, Kuo-Chen
2005-01-07
Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Stimuli Reduce the Dimensionality of Cortical Activity
Mazzucato, Luca; Fontanini, Alfredo; La Camera, Giancarlo
2016-01-01
The activity of ensembles of simultaneously recorded neurons can be represented as a set of points in the space of firing rates. Even though the dimension of this space is equal to the ensemble size, neural activity can be effectively localized on smaller subspaces. The dimensionality of the neural space is an important determinant of the computational tasks supported by the neural activity. Here, we investigate the dimensionality of neural ensembles from the sensory cortex of alert rats during periods of ongoing (inter-trial) and stimulus-evoked activity. We find that dimensionality grows linearly with ensemble size, and grows significantly faster during ongoing activity compared to evoked activity. We explain these results using a spiking network model based on a clustered architecture. The model captures the difference in growth rate between ongoing and evoked activity and predicts a characteristic scaling with ensemble size that could be tested in high-density multi-electrode recordings. Moreover, we present a simple theory that predicts the existence of an upper bound on dimensionality. This upper bound is inversely proportional to the amount of pair-wise correlations and, compared to a homogeneous network without clusters, it is larger by a factor equal to the number of clusters. The empirical estimation of such bounds depends on the number and duration of trials and is well predicted by the theory. Together, these results provide a framework to analyze neural dimensionality in alert animals, its behavior under stimulus presentation, and its theoretical dependence on ensemble size, number of clusters, and correlations in spiking network models. PMID:26924968
Stimuli Reduce the Dimensionality of Cortical Activity.
Mazzucato, Luca; Fontanini, Alfredo; La Camera, Giancarlo
2016-01-01
The activity of ensembles of simultaneously recorded neurons can be represented as a set of points in the space of firing rates. Even though the dimension of this space is equal to the ensemble size, neural activity can be effectively localized on smaller subspaces. The dimensionality of the neural space is an important determinant of the computational tasks supported by the neural activity. Here, we investigate the dimensionality of neural ensembles from the sensory cortex of alert rats during periods of ongoing (inter-trial) and stimulus-evoked activity. We find that dimensionality grows linearly with ensemble size, and grows significantly faster during ongoing activity compared to evoked activity. We explain these results using a spiking network model based on a clustered architecture. The model captures the difference in growth rate between ongoing and evoked activity and predicts a characteristic scaling with ensemble size that could be tested in high-density multi-electrode recordings. Moreover, we present a simple theory that predicts the existence of an upper bound on dimensionality. This upper bound is inversely proportional to the amount of pair-wise correlations and, compared to a homogeneous network without clusters, it is larger by a factor equal to the number of clusters. The empirical estimation of such bounds depends on the number and duration of trials and is well predicted by the theory. Together, these results provide a framework to analyze neural dimensionality in alert animals, its behavior under stimulus presentation, and its theoretical dependence on ensemble size, number of clusters, and correlations in spiking network models.
Diffusion maps for high-dimensional single-cell analysis of differentiation data.
Haghverdi, Laleh; Buettner, Florian; Theis, Fabian J
2015-09-15
Single-cell technologies have recently gained popularity in cellular differentiation studies regarding their ability to resolve potential heterogeneities in cell populations. Analyzing such high-dimensional single-cell data has its own statistical and computational challenges. Popular multivariate approaches are based on data normalization, followed by dimension reduction and clustering to identify subgroups. However, in the case of cellular differentiation, we would not expect clear clusters to be present but instead expect the cells to follow continuous branching lineages. Here, we propose the use of diffusion maps to deal with the problem of defining differentiation trajectories. We adapt this method to single-cell data by adequate choice of kernel width and inclusion of uncertainties or missing measurement values, which enables the establishment of a pseudotemporal ordering of single cells in a high-dimensional gene expression space. We expect this output to reflect cell differentiation trajectories, where the data originates from intrinsic diffusion-like dynamics. Starting from a pluripotent stage, cells move smoothly within the transcriptional landscape towards more differentiated states with some stochasticity along their path. We demonstrate the robustness of our method with respect to extrinsic noise (e.g. measurement noise) and sampling density heterogeneities on simulated toy data as well as two single-cell quantitative polymerase chain reaction datasets (i.e. mouse haematopoietic stem cells and mouse embryonic stem cells) and an RNA-Seq data of human pre-implantation embryos. We show that diffusion maps perform considerably better than Principal Component Analysis and are advantageous over other techniques for non-linear dimension reduction such as t-distributed Stochastic Neighbour Embedding for preserving the global structures and pseudotemporal ordering of cells. The Matlab implementation of diffusion maps for single-cell data is available at https://www.helmholtz-muenchen.de/icb/single-cell-diffusion-map. fbuettner.phys@gmail.com, fabian.theis@helmholtz-muenchen.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.
Muja, Marius; Lowe, David G
2014-11-01
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
McParland, D; Phillips, C M; Brennan, L; Roche, H M; Gormley, I C
2017-12-10
The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Effect of palladium doping on the stability and fragmentation patterns of cationic gold clusters
NASA Astrophysics Data System (ADS)
Ferrari, P.; Hussein, H. A.; Heard, C. J.; Vanbuel, J.; Johnston, R. L.; Lievens, P.; Janssens, E.
2018-05-01
We analyze in detail how the interplay between electronic structure and cluster geometry determines the stability and the fragmentation channels of single Pd-doped cationic Au clusters, PdA uN-1+ (N =2 -20 ). For this purpose, a combination of photofragmentation experiments and density functional theory calculations was employed. A remarkable agreement between the experiment and the calculations is obtained. Pd doping is found to modify the structure of the Au clusters, in particular altering the two-dimensional to three-dimensional transition size, with direct consequences on the stability of the clusters. Analysis of the electronic density of states of the clusters shows that depending on cluster size, Pd delocalizes one 4 d electron, giving an enhanced stability to PdA u6 + , or remains with all 4 d10 electrons localized, closing an electronic shell in PdA u9 + . Furthermore, it is observed that for most clusters, Au evaporation is the lowest-energy decay channel, although for some sizes Pd evaporation competes. In particular, PdA u7 + and PdA u9 + decay by Pd evaporation due to the high stability of the A u7 + and A u9 + fragmentation products.
Morphew, Daniel; Shaw, James; Avins, Christopher; Chakrabarti, Dwaipayan
2018-03-27
Colloidal self-assembly is a promising bottom-up route to a wide variety of three-dimensional structures, from clusters to crystals. Programming hierarchical self-assembly of colloidal building blocks, which can give rise to structures ordered at multiple levels to rival biological complexity, poses a multiscale design problem. Here we explore a generic design principle that exploits a hierarchy of interaction strengths and employ this design principle in computer simulations to demonstrate the hierarchical self-assembly of triblock patchy colloidal particles into two distinct colloidal crystals. We obtain cubic diamond and body-centered cubic crystals via distinct clusters of uniform size and shape, namely, tetrahedra and octahedra, respectively. Such a conceptual design framework has the potential to reliably encode hierarchical self-assembly of colloidal particles into a high level of sophistication. Moreover, the design framework underpins a bottom-up route to cubic diamond colloidal crystals, which have remained elusive despite being much sought after for their attractive photonic applications.
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.
Bi, Xia-An; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.
Adaptive and dynamic meshing methods for numerical simulations
NASA Astrophysics Data System (ADS)
Acikgoz, Nazmiye
For the numerical simulation of many problems of engineering interest, it is desirable to have an automated mesh adaption tool capable of producing high quality meshes with an affordably low number of mesh points. This is important especially for problems, which are characterized by anisotropic features of the solution and require mesh clustering in the direction of high gradients. Another significant issue in meshing emerges in the area of unsteady simulations with moving boundaries or interfaces, where the motion of the boundary has to be accommodated by deforming the computational grid. Similarly, there exist problems where current mesh needs to be adapted to get more accurate solutions because either the high gradient regions are initially predicted inaccurately or they change location throughout the simulation. To solve these problems, we propose three novel procedures. For this purpose, in the first part of this work, we present an optimization procedure for three-dimensional anisotropic tetrahedral grids based on metric-driven h-adaptation. The desired anisotropy in the grid is dictated by a metric that defines the size, shape, and orientation of the grid elements throughout the computational domain. Through the use of topological and geometrical operators, the mesh is iteratively adapted until the final mesh minimizes a given objective function. In this work, the objective function measures the distance between the metric of each simplex and a target metric, which can be either user-defined (a-priori) or the result of a-posteriori error analysis. During the adaptation process, one tries to decrease the metric-based objective function until the final mesh is compliant with the target within a given tolerance. However, in regions such as corners and complex face intersections, the compliance condition was found to be very difficult or sometimes impossible to satisfy. In order to address this issue, we propose an optimization process based on an ad-hoc application of the simulated annealing technique, which improves the likelihood of removing poor elements from the grid. Moreover, a local implementation of the simulated annealing is proposed to reduce the computational cost. Many challenging multi-physics and multi-field problems that are unsteady in nature are characterized by moving boundaries and/or interfaces. When the boundary displacements are large, which typically occurs when implicit time marching procedures are used, degenerate elements are easily formed in the grid such that frequent remeshing is required. To deal with this problem, in the second part of this work, we propose a new r-adaptation methodology. The new technique is valid for both simplicial (e.g., triangular, tet) and non-simplicial (e.g., quadrilateral, hex) deforming grids that undergo large imposed displacements at their boundaries. A two- or three-dimensional grid is deformed using a network of linear springs composed of edge springs and a set of virtual springs. The virtual springs are constructed in such a way as to oppose element collapsing. This is accomplished by confining each vertex to its ball through springs that are attached to the vertex and its projection on the ball entities. The resulting linear problem is solved using a preconditioned conjugate gradient method. The new method is compared with the classical spring analogy technique in two- and three-dimensional examples, highlighting the performance improvements achieved by the new method. Meshes are an important part of numerical simulations. Depending on the geometry and flow conditions, the most suitable mesh for each particular problem is different. Meshes are usually generated by either using a suitable software package or solving a PDE. In both cases, engineering intuition plays a significant role in deciding where clusterings should take place. In addition, for unsteady problems, the gradients vary for each time step, which requires frequent remeshing during simulations. Therefore, in order to minimize user intervention and prevent frequent remeshings, we conclude this work by defining a novel mesh adaptation technique that integrates metric based target mesh definitions with the ball-vertex mesh deformation method. In this new approach, the entire mesh is deformed based on either an a-priori or an a-posteriori error estimator. In other words, nodal points are repositioned upon application of a force field in order to comply with the target mesh or to get more accurate solutions. The method has been tested for two-dimensional problems of a-priori metric definitions as well as for oblique shock clusterings.
Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications.
Karyotis, Vasileios; Tsitseklis, Konstantinos; Sotiropoulos, Konstantinos; Papavassiliou, Symeon
2018-04-15
In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan-Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing.
Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications
Sotiropoulos, Konstantinos
2018-01-01
In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan–Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing. PMID:29662043
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Weixuan; Lin, Guang; Li, Bing
2016-09-01
A well-known challenge in uncertainty quantification (UQ) is the "curse of dimensionality". However, many high-dimensional UQ problems are essentially low-dimensional, because the randomness of the quantity of interest (QoI) is caused only by uncertain parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace. Motivated by this observation, we propose and demonstrate in this paper an inverse regression-based UQ approach (IRUQ) for high-dimensional problems. Specifically, we use an inverse regression procedure to estimate the SDR subspace and then convert the original problem to a low-dimensional one, which can be efficiently solved by building a response surface model such as a polynomial chaos expansion. The novelty and advantages of the proposed approach is seen in its computational efficiency and practicality. Comparing with Monte Carlo, the traditionally preferred approach for high-dimensional UQ, IRUQ with a comparable cost generally gives much more accurate solutions even for high-dimensional problems, and even when the dimension reduction is not exactly sufficient. Theoretically, IRUQ is proved to converge twice as fast as the approach it uses seeking the SDR subspace. For example, while a sliced inverse regression method converges to the SDR subspace at the rate ofmore » $$O(n^{-1/2})$$, the corresponding IRUQ converges at $$O(n^{-1})$$. IRUQ also provides several desired conveniences in practice. It is non-intrusive, requiring only a simulator to generate realizations of the QoI, and there is no need to compute the high-dimensional gradient of the QoI. Finally, error bars can be derived for the estimation results reported by IRUQ.« less
A classification of substance-dependent men on temperament and severity variables.
Henderson, Melinda J; Galen, Luke W
2003-06-01
This study examined the validity of classifying substance abusers based on temperament and dependence severity, and expanded the scope of typology differences to proximal determinants of use (e.g., expectancies, motives). Patients were interviewed about substance use, depression, and family history of alcohol and drug abuse. Self-report instruments measuring temperament, expectancies, and motives were completed. Participants were 147 male veterans admitted to inpatient substance abuse treatment at a U.S. Department of Veterans Affairs medical center. Cluster analysis identified four types of users with two high substance problem severity and two low substance problem severity groups. Two, high problem severity, early onset groups differed only on the cluster variable of negative affectivity (NA), but showed differences on antisocial personality characteristics, hypochondriasis, and coping motives for alcohol. The two low problem severity groups were distinguished by age of onset and positive affectivity (PA). The late onset, low PA group had a higher incidence of depression, a greater tendency to use substances in solitary contexts, and lower enhancement motives for alcohol compared to the early onset, high PA cluster. The four-cluster solution yielded more distinctions on external criteria than the two-cluster solution. Such temperament variation within both high and low severity substance abusers may be important for treatment planning.
Topology for Dominance for Network of Multi-Agent System
NASA Astrophysics Data System (ADS)
Szeto, K. Y.
2007-05-01
The resource allocation problem in evolving two-dimensional point patterns is investigated for the existence of good strategies for the construction of initial configuration that leads to fast dominance of the pattern by one single species, which can be interpreted as market dominance by a company in the context of multi-agent systems in econophysics. For hexagonal lattice, certain special topological arrangements of the resource in two-dimensions, such as rings, lines and clusters have higher probability of dominance, compared to random pattern. For more complex networks, a systematic way to search for a stable and dominant strategy of resource allocation in the changing environment is found by means of genetic algorithm. Five typical features can be summarized by means of the distribution function for the local neighborhood of friends and enemies as well as the local clustering coefficients: (1) The winner has more triangles than the loser has. (2) The winner likes to form clusters as the winner tends to connect with other winner rather than with losers; while the loser tends to connect with winners rather than losers. (3) The distribution function of friends as well as enemies for the winner is broader than the corresponding distribution function for the loser. (4) The connectivity at which the peak of the distribution of friends for the winner occurs is larger than that of the loser; while the peak values for friends for winners is lower. (5) The connectivity at which the peak of the distribution of enemies for the winner occurs is smaller than that of the loser; while the peak values for enemies for winners is lower. These five features appear to be general, at least in the context of two-dimensional hexagonal lattices of various sizes, hierarchical lattice, Voronoi diagrams, as well as high-dimensional random networks. These general local topological properties of networks are relevant to strategists aiming at dominance in evolving patterns when the interaction between the agents is local.
Modes of self-organization of diluted bubbly liquids in acoustic fields: One-dimensional theory.
Gumerov, Nail A; Akhatov, Iskander S
2017-02-01
The paper is dedicated to mathematical modeling of self-organization of bubbly liquids in acoustic fields. A continuum model describing the two-way interaction of diluted polydisperse bubbly liquids and acoustic fields in weakly-nonlinear approximation is studied analytically and numerically in the one-dimensional case. It is shown that the regimes of self-organization of monodisperse bubbly liquids can be controlled by only a few dimensionless parameters. Two basic modes, clustering and propagating shock waves of void fraction (acoustically induced transparency), are identified and criteria for their realization in the space of parameters are proposed. A numerical method for solving of one-dimensional self-organization problems is developed. Computational results for mono- and polydisperse systems are discussed.
Designing a robust activity recognition framework for health and exergaming using wearable sensors.
Alshurafa, Nabil; Xu, Wenyao; Liu, Jason J; Huang, Ming-Chun; Mortazavi, Bobak; Roberts, Christian K; Sarrafzadeh, Majid
2014-09-01
Detecting human activity independent of intensity is essential in many applications, primarily in calculating metabolic equivalent rates and extracting human context awareness. Many classifiers that train on an activity at a subset of intensity levels fail to recognize the same activity at other intensity levels. This demonstrates weakness in the underlying classification method. Training a classifier for an activity at every intensity level is also not practical. In this paper, we tackle a novel intensity-independent activity recognition problem where the class labels exhibit large variability, the data are of high dimensionality, and clustering algorithms are necessary. We propose a new robust stochastic approximation framework for enhanced classification of such data. Experiments are reported using two clustering techniques, K-Means and Gaussian Mixture Models. The stochastic approximation algorithm consistently outperforms other well-known classification schemes which validate the use of our proposed clustered data representation. We verify the motivation of our framework in two applications that benefit from intensity-independent activity recognition. The first application shows how our framework can be used to enhance energy expenditure calculations. The second application is a novel exergaming environment aimed at using games to reward physical activity performed throughout the day, to encourage a healthy lifestyle.
Ke, Tracy; Fan, Jianqing; Wu, Yichao
2014-01-01
This paper explores the homogeneity of coefficients in high-dimensional regression, which extends the sparsity concept and is more general and suitable for many applications. Homogeneity arises when regression coefficients corresponding to neighboring geographical regions or a similar cluster of covariates are expected to be approximately the same. Sparsity corresponds to a special case of homogeneity with a large cluster of known atom zero. In this article, we propose a new method called clustering algorithm in regression via data-driven segmentation (CARDS) to explore homogeneity. New mathematics are provided on the gain that can be achieved by exploring homogeneity. Statistical properties of two versions of CARDS are analyzed. In particular, the asymptotic normality of our proposed CARDS estimator is established, which reveals better estimation accuracy for homogeneous parameters than that without homogeneity exploration. When our methods are combined with sparsity exploration, further efficiency can be achieved beyond the exploration of sparsity alone. This provides additional insights into the power of exploring low-dimensional structures in high-dimensional regression: homogeneity and sparsity. Our results also shed lights on the properties of the fussed Lasso. The newly developed method is further illustrated by simulation studies and applications to real data. Supplementary materials for this article are available online. PMID:26085701
Interactions of small platinum clusters with the TiC(001) surface
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mao, Jianjun; Li, Shasha; Chu, Xingli
2015-11-14
Density functional theory calculations are used to elucidate the interactions of small platinum clusters (Pt{sub n}, n = 1–5) with the TiC(001) surface. The results are analyzed in terms of geometric, energetic, and electronic properties. It is found that a single Pt atom prefers to be adsorbed at the C-top site, while a Pt{sub 2} cluster prefers dimerization and a Pt{sub 3} cluster forms a linear structure on the TiC(001). As for the Pt{sub 4} cluster, the three-dimensional distorted tetrahedral structure and the two-dimensional square structure almost have equal stability. In contrast with the two-dimensional isolated Pt{sub 5} cluster, the adsorbed Pt{submore » 5} cluster prefers a three-dimensional structure on TiC(001). Substantial charge transfer takes place from TiC(001) surface to the adsorbed Pt{sub n} clusters, resulting in the negatively charged Pt{sub n} clusters. At last, the d-band centers of the absorbed Pt atoms and their implications in the catalytic activity are discussed.« less
Vigelius, Matthias; Meyer, Bernd
2012-01-01
For many biological applications, a macroscopic (deterministic) treatment of reaction-drift-diffusion systems is insufficient. Instead, one has to properly handle the stochastic nature of the problem and generate true sample paths of the underlying probability distribution. Unfortunately, stochastic algorithms are computationally expensive and, in most cases, the large number of participating particles renders the relevant parameter regimes inaccessible. In an attempt to address this problem we present a genuine stochastic, multi-dimensional algorithm that solves the inhomogeneous, non-linear, drift-diffusion problem on a mesoscopic level. Our method improves on existing implementations in being multi-dimensional and handling inhomogeneous drift and diffusion. The algorithm is well suited for an implementation on data-parallel hardware architectures such as general-purpose graphics processing units (GPUs). We integrate the method into an operator-splitting approach that decouples chemical reactions from the spatial evolution. We demonstrate the validity and applicability of our algorithm with a comprehensive suite of standard test problems that also serve to quantify the numerical accuracy of the method. We provide a freely available, fully functional GPU implementation. Integration into Inchman, a user-friendly web service, that allows researchers to perform parallel simulations of reaction-drift-diffusion systems on GPU clusters is underway. PMID:22506001
B =5 Skyrmion as a two-cluster system
NASA Astrophysics Data System (ADS)
Gudnason, Sven Bjarke; Halcrow, Chris
2018-06-01
The classical B =5 Skyrmion can be approximated by a two-cluster system in which a B =1 Skyrmion is attached to a core B =4 Skyrmion. We quantize this system, allowing the B =1 to freely orbit the core. The configuration space is 11 dimensional but simplifies significantly after factoring out the overall spin and isospin degrees of freedom. We exactly solve the free quantum problem and then include an interaction potential between the Skyrmions numerically. The resulting energy spectrum is compared to the corresponding nuclei—the helium-5/lithium-5 isodoublet. We find approximate parity doubling not seen in the experimental data. In addition, we fail to obtain the correct ground-state spin. The framework laid out for this two-cluster system can readily be modified for other clusters and in particular for other B =4 n +1 nuclei, of which B =5 is the simplest example.
Kennedy, Angie C; Adams, Adrienne E
2016-04-01
Using a cluster analysis approach with a sample of 205 young mothers recruited from community sites in an urban Midwestern setting, we examined the effects of cumulative violence exposure (community violence exposure, witnessing intimate partner violence, physical abuse by a caregiver, and sexual victimization, all with onset prior to age 13) on school participation, as mediated by attention and behavior problems in school. We identified five clusters of cumulative exposure, and found that the HiAll cluster (high levels of exposure to all four types) consistently fared the worst, with significantly higher attention and behavior problems, and lower school participation, in comparison with the LoAll cluster (low levels of exposure to all types). Behavior problems were a significant mediator of the effects of cumulative violence exposure on school participation, but attention problems were not. © The Author(s) 2014.
Kong, Xiang-Zhen; Liu, Jin-Xing; Zheng, Chun-Hou; Hou, Mi-Xiao; Wang, Juan
2017-07-01
High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.
Quantum Computational Universality of the 2D Cai-Miyake-D"ur-Briegel Quantum State
NASA Astrophysics Data System (ADS)
Wei, Tzu-Chieh; Raussendorf, Robert; Kwek, Leong Chuan
2012-02-01
Universal quantum computation can be achieved by simply performing single-qubit measurements on a highly entangled resource state, such as cluster states. Cai, Miyake, D"ur, and Briegel recently constructed a ground state of a two-dimensional quantum magnet by combining multiple Affleck-Kennedy-Lieb-Tasaki quasichains of mixed spin-3/2 and spin-1/2 entities and by mapping pairs of neighboring spin-1/2 particles to individual spin-3/2 particles [Phys. Rev. A 82, 052309 (2010)]. They showed that this state enables universal quantum computation by constructing single- and two-qubit universal gates. Here, we give an alternative understanding of how this state gives rise to universal measurement-based quantum computation: by local operations, each quasichain can be converted to a one-dimensional cluster state and entangling gates between two neighboring logical qubits can be implemented by single-spin measurements. Furthermore, a two-dimensional cluster state can be distilled from the Cai-Miyake-D"ur-Briegel state.
Wu, Han; Zhang, Yu-Qi; Hu, Min-Biao; Ren, Li-Jun; Lin, Yue; Wang, Wei
2017-05-30
Clusters are an important class of nanoscale molecules or superatoms that exhibit an amazing diversity in structure, chemical composition, shape, and functionality. Assembling two types of clusters is creating emerging cluster-assembled materials (CAMs). In this paper, we report an effective approach to produce quasi two-dimensional (2D) CAMs of two types of spherelike clusters, polyhedral oligomeric silsesquioxanes (POSS), and polyoxometalates (POM). To avoid macrophase separation between the two clusters, they are covalently linked to form a POM-POSS cocluster with Janus characteristics and a dumbbell shape. This Janus characteristics enables the cocluster to self-assemble into diverse nanoaggregates, as conventional amphiphilic molecules and macromolecules do, in selective solvents. In our study, we obtained micelles, vesicles, nanosheets, and nanoribbons by tuning the n-hexane content in mixed solvents of acetone and n-hexane. Ordered packing of clusters in the nanosheets and nanoribbons were directly visualized using high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) technique. We infer that the increase of packing order results in the vesicle-to-sheet transition and the change in packing mode causes the sheet-to-ribbon transitions. Our findings have verified the effectivity of creating quasi 2D cluster-assembled materials though the cocluster self-assembly as a new approach to produce novel CAMs.
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
Zhang, Dake; Stecker, Pamela; Huckabee, Sloan; Miller, Rhonda
2016-09-01
Research has suggested that different strategies used when solving fraction problems are highly correlated with students' problem-solving accuracy. This study (a) utilized latent profile modeling to classify students into three different strategic developmental levels in solving fraction comparison problems and (b) accordingly provided differentiated strategic training for students starting from two different strategic developmental levels. In Study 1 we assessed 49 middle school students' performance on fraction comparison problems and categorized students into three clusters of strategic developmental clusters: a cross-multiplication cluster with the highest accuracy, a representation strategy cluster with medium accuracy, and a whole-number strategy cluster with the lowest accuracy. Based on the strategic developmental levels identified in Study 1, in Study 2 we selected three students from the whole-number strategy cluster and another three students from the representation strategy cluster and implemented a differentiated strategic training intervention within a multiple-baseline design. Results showed that both groups of students transitioned from less advanced to more advanced strategies and improved their problem-solving accuracy during the posttest, the maintenance test, and the generalization test. © Hammill Institute on Disabilities 2014.
2013-01-01
Background The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. Results In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Conclusions Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship. PMID:23845024
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Technical Reports Server (NTRS)
Hsu, Andrew T.; Lytle, John K.
1989-01-01
An algebraic adaptive grid scheme based on the concept of arc equidistribution is presented. The scheme locally adjusts the grid density based on gradients of selected flow variables from either finite difference or finite volume calculations. A user-prescribed grid stretching can be specified such that control of the grid spacing can be maintained in areas of known flowfield behavior. For example, the grid can be clustered near a wall for boundary layer resolution and made coarse near the outer boundary of an external flow. A grid smoothing technique is incorporated into the adaptive grid routine, which is found to be more robust and efficient than the weight function filtering technique employed by other researchers. Since the present algebraic scheme requires no iteration or solution of differential equations, the computer time needed for grid adaptation is trivial, making the scheme useful for three-dimensional flow problems. Applications to two- and three-dimensional flow problems show that a considerable improvement in flowfield resolution can be achieved by using the proposed adaptive grid scheme. Although the scheme was developed with steady flow in mind, it is a good candidate for unsteady flow computations because of its efficiency.
Ding, Jiarui; Condon, Anne; Shah, Sohrab P
2018-05-21
Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.
Study of CP(N-1) theta-vacua by cluster simulation of SU(N) quantum spin ladders.
Beard, B B; Pepe, M; Riederer, S; Wiese, U-J
2005-01-14
D-theory provides an alternative lattice regularization of the 2D CP(N-1) quantum field theory in which continuous classical fields emerge from the dimensional reduction of discrete SU(N) quantum spins. Spin ladders consisting of n transversely coupled spin chains lead to a CP(N-1) model with a vacuum angle theta=npi. In D-theory no sign problem arises and an efficient cluster algorithm is used to investigate theta-vacuum effects. At theta=pi there is a first order phase transition with spontaneous breaking of charge conjugation symmetry for CP(N-1) models with N>2.
A clustering algorithm for determining community structure in complex networks
NASA Astrophysics Data System (ADS)
Jin, Hong; Yu, Wei; Li, ShiJun
2018-02-01
Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
On line separation of overlapped signals from multi-time photons for the GEM-based detection system
NASA Astrophysics Data System (ADS)
Czarski, T.; Pozniak, K. T.; Chernyshova, M.; Malinowski, K.; Kasprowicz, G.; Kolasinski, P.; Krawczyk, R.; Wojenski, A.; Zabolotny, W.
2015-09-01
The Triple Gas Electron Multiplier (T-GEM) is presented as soft X-ray (SXR) energy and position sensitive detector for high-resolution X-ray diagnostics of magnetic confinement fusion plasmas. Multi-channel measurement system and serial data acquisition for X-ray energy and position recognition is described. Fundamental characteristics are presented for two dimensional detector structure. Typical signals of ADC - Analog to Digital Converter are considered for charge value and position estimation. Coinciding signals for high flux radiation cause the problem for cluster charge identification. The amplifier with shaper determines time characteristics and limits the pulses frequency. Separation of coincided signals was introduced and verified for simulation experiments. On line separation of overlapped signals was implemented applying the FPGA technology with relatively simple firmware procedure. Representative results for reconstruction of coinciding signals are demonstrated.
Unsupervised machine learning account of magnetic transitions in the Hubbard model
NASA Astrophysics Data System (ADS)
Ch'ng, Kelvin; Vazquez, Nick; Khatami, Ehsan
2018-01-01
We employ several unsupervised machine learning techniques, including autoencoders, random trees embedding, and t -distributed stochastic neighboring ensemble (t -SNE), to reduce the dimensionality of, and therefore classify, raw (auxiliary) spin configurations generated, through Monte Carlo simulations of small clusters, for the Ising and Fermi-Hubbard models at finite temperatures. Results from a convolutional autoencoder for the three-dimensional Ising model can be shown to produce the magnetization and the susceptibility as a function of temperature with a high degree of accuracy. Quantum fluctuations distort this picture and prevent us from making such connections between the output of the autoencoder and physical observables for the Hubbard model. However, we are able to define an indicator based on the output of the t -SNE algorithm that shows a near perfect agreement with the antiferromagnetic structure factor of the model in two and three spatial dimensions in the weak-coupling regime. t -SNE also predicts a transition to the canted antiferromagnetic phase for the three-dimensional model when a strong magnetic field is present. We show that these techniques cannot be expected to work away from half filling when the "sign problem" in quantum Monte Carlo simulations is present.
Thematic clustering of text documents using an EM-based approach
2012-01-01
Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well. PMID:23046528
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Dimensional assessment of personality pathology in patients with eating disorders.
Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L
1999-02-22
This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.
NASA Astrophysics Data System (ADS)
Cui, Tiangang; Marzouk, Youssef; Willcox, Karen
2016-06-01
Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting-both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.
Narayan, Angela J; Allen, Timothy A; Cullen, Kathryn R; Klimes-Dougan, Bonnie
2013-01-01
Objectives This comprehensive review examined the prevalence and progression of disturbances in reality testing (DRT), defined as psychotic symptoms, cognitive disruptions, and thought problems, in offspring of parents with bipolar disorder (O-BD). Our approach was grounded in a developmental psychopathology perspective and considered a broader phenotype of risk within the bipolar–schizophrenia spectrum as measured by categorical and dimensional assessments of DRT in high-risk youth. Methods Relevant studies were identified from numerous sources (e.g., PubMed, reference sections, and colleagues). Inclusion criteria were: (i) family risk studies published between 1975 and 2012 in which O-BD were contrasted with a comparison group (e.g., offspring of parents who had other psychiatric disorders or were healthy) on DRT outcomes and (ii) results reported for categorical or dimensional assessments of DRT (e.g., schizophrenia, psychotic symptoms, cluster A personality traits, or thought problems), yielding a total of 23 studies. Results Three key findings emerged: (i) categorical approaches of DRT in O-BD produced low incidence base rates and almost no evidence of significant differences in DRT between O-BD and comparison groups, whereas (ii) many studies using dimensional assessments of DRT yielded significant group differences in DRT. Furthermore, (iii) preliminary evidence from dimensional measures suggested that the developmental progression of DRT in O-BD might represent a prodrome of severe psychological impairment. Conclusions Preliminary but promising evidence suggests that DRT is a probable marker of risk for future impairment in O-BD. Methodological strengths and weaknesses, the psychometric properties of primary DRT constructs, and future directions for developmental and longitudinal research with O-BD are discussed. PMID:24034419
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Strongest Earthquake-Prone Areas in Kamchatka
NASA Astrophysics Data System (ADS)
Dzeboev, B. A.; Agayan, S. M.; Zharkikh, Yu. I.; Krasnoperov, R. I.; Barykina, Yu. V.
2018-03-01
The paper continues the series of our works on recognizing the areas prone to the strongest, strong, and significant earthquakes with the use of the Formalized Clustering And Zoning (FCAZ) intellectual clustering system. We recognized the zones prone to the probable emergence of epicenters of the strongest ( M ≥ 74/3) earthquakes on the Pacific Coast of Kamchatka. The FCAZ-zones are compared to the zones that were recognized in 1984 by the classical recognition method for Earthquake-Prone Areas (EPA) by transferring the criteria of high seismicity from the Andes mountain belt to the territory of Kamchatka. The FCAZ recognition was carried out with two-dimensional and three-dimensional objects of recognition.
Decomposition method for zonal resource allocation problems in telecommunication networks
NASA Astrophysics Data System (ADS)
Konnov, I. V.; Kashuba, A. Yu
2016-11-01
We consider problems of optimal resource allocation in telecommunication networks. We first give an optimization formulation for the case where the network manager aims to distribute some homogeneous resource (bandwidth) among users of one region with quadratic charge and fee functions and present simple and efficient solution methods. Next, we consider a more general problem for a provider of a wireless communication network divided into zones (clusters) with common capacity constraints. We obtain a convex quadratic optimization problem involving capacity and balance constraints. By using the dual Lagrangian method with respect to the capacity constraint, we suggest to reduce the initial problem to a single-dimensional optimization problem, but calculation of the cost function value leads to independent solution of zonal problems, which coincide with the above single region problem. Some results of computational experiments confirm the applicability of the new methods.
Systematic exploration of unsupervised methods for mapping behavior
NASA Astrophysics Data System (ADS)
Todd, Jeremy G.; Kain, Jamey S.; de Bivort, Benjamin L.
2017-02-01
To fully understand the mechanisms giving rise to behavior, we need to be able to precisely measure it. When coupled with large behavioral data sets, unsupervised clustering methods offer the potential of unbiased mapping of behavioral spaces. However, unsupervised techniques to map behavioral spaces are in their infancy, and there have been few systematic considerations of all the methodological options. We compared the performance of seven distinct mapping methods in clustering a wavelet-transformed data set consisting of the x- and y-positions of the six legs of individual flies. Legs were automatically tracked by small pieces of fluorescent dye, while the fly was tethered and walking on an air-suspended ball. We find that there is considerable variation in the performance of these mapping methods, and that better performance is attained when clustering is done in higher dimensional spaces (which are otherwise less preferable because they are hard to visualize). High dimensionality means that some algorithms, including the non-parametric watershed cluster assignment algorithm, cannot be used. We developed an alternative watershed algorithm which can be used in high-dimensional spaces when a probability density estimate can be computed directly. With these tools in hand, we examined the behavioral space of fly leg postural dynamics and locomotion. We find a striking division of behavior into modes involving the fore legs and modes involving the hind legs, with few direct transitions between them. By computing behavioral clusters using the data from all flies simultaneously, we show that this division appears to be common to all flies. We also identify individual-to-individual differences in behavior and behavioral transitions. Lastly, we suggest a computational pipeline that can achieve satisfactory levels of performance without the taxing computational demands of a systematic combinatorial approach.
Nam, Julia EunJu; Mueller, Klaus
2013-02-01
Gaining a true appreciation of high-dimensional space remains difficult since all of the existing high-dimensional space exploration techniques serialize the space travel in some way. This is not so foreign to us since we, when traveling, also experience the world in a serial fashion. But we typically have access to a map to help with positioning, orientation, navigation, and trip planning. Here, we propose a multivariate data exploration tool that compares high-dimensional space navigation with a sightseeing trip. It decomposes this activity into five major tasks: 1) Identify the sights: use a map to identify the sights of interest and their location; 2) Plan the trip: connect the sights of interest along a specifyable path; 3) Go on the trip: travel along the route; 4) Hop off the bus: experience the location, look around, zoom into detail; and 5) Orient and localize: regain bearings in the map. We describe intuitive and interactive tools for all of these tasks, both global navigation within the map and local exploration of the data distributions. For the latter, we describe a polygonal touchpad interface which enables users to smoothly tilt the projection plane in high-dimensional space to produce multivariate scatterplots that best convey the data relationships under investigation. Motion parallax and illustrative motion trails aid in the perception of these transient patterns. We describe the use of our system within two applications: 1) the exploratory discovery of data configurations that best fit a personal preference in the presence of tradeoffs and 2) interactive cluster analysis via cluster sculpting in N-D.
Swarm v2: highly-scalable and high-resolution amplicon clustering.
Mahé, Frédéric; Rognes, Torbjørn; Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah
2015-01-01
Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.
Uniform high order spectral methods for one and two dimensional Euler equations
NASA Technical Reports Server (NTRS)
Cai, Wei; Shu, Chi-Wang
1991-01-01
Uniform high order spectral methods to solve multi-dimensional Euler equations for gas dynamics are discussed. Uniform high order spectral approximations with spectral accuracy in smooth regions of solutions are constructed by introducing the idea of the Essentially Non-Oscillatory (ENO) polynomial interpolations into the spectral methods. The authors present numerical results for the inviscid Burgers' equation, and for the one dimensional Euler equations including the interactions between a shock wave and density disturbance, Sod's and Lax's shock tube problems, and the blast wave problem. The interaction between a Mach 3 two dimensional shock wave and a rotating vortex is simulated.
Unsupervised spike sorting based on discriminative subspace learning.
Keshtkaran, Mohammad Reza; Yang, Zhi
2014-01-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.
Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K
2013-03-01
Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
NASA Astrophysics Data System (ADS)
Yevsyukov, N. N.
1985-09-01
An approximate isolation algorithm for the isolation of multidimensional clusters is developed and applied in the construction of a three-dimensional diagram of the optical characteristics of the lunar surface. The method is somewhat analogous to that of Koontz and Fukunaga (1972) and involves isolating two-dimensional clusters, adding a new characteristic, and linearizing, a cycle which is repeated a limited number of times. The lunar-surface parameters analyzed are the 620-nm albedo, the 620/380-nm color index, and the 950/620-nm index. The results are presented graphically; the reliability of the cluster-isolation process is discussed; and some correspondences between known lunar morphology and the cluster maps are indicated.
Wang, Xueyi
2012-02-08
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
ERIC Educational Resources Information Center
Kamp-Becker, Inge; Smidt, Judith; Ghahreman, Mardjan; Heinzel-Gutenbrunner, Monika; Becker, Katja; Remschmidt, Helmut
2010-01-01
There is an ongoing debate whether a differentiation of autistic subtypes, especially between Asperger Syndrome (AS) and high-functioning-autism (HFA) is possible and if so, whether it is a categorical or dimensional one. The aim of this study was to examine the possible clustering of responses in different symptom domains without making any…
Discriminative clustering on manifold for adaptive transductive classification.
Zhang, Zhao; Jia, Lei; Zhang, Min; Li, Bing; Zhang, Li; Li, Fanzhang
2017-10-01
In this paper, we mainly propose a novel adaptive transductive label propagation approach by joint discriminative clustering on manifolds for representing and classifying high-dimensional data. Our framework seamlessly combines the unsupervised manifold learning, discriminative clustering and adaptive classification into a unified model. Also, our method incorporates the adaptive graph weight construction with label propagation. Specifically, our method is capable of propagating label information using adaptive weights over low-dimensional manifold features, which is different from most existing studies that usually predict the labels and construct the weights in the original Euclidean space. For transductive classification by our formulation, we first perform the joint discriminative K-means clustering and manifold learning to capture the low-dimensional nonlinear manifolds. Then, we construct the adaptive weights over the learnt manifold features, where the adaptive weights are calculated through performing the joint minimization of the reconstruction errors over features and soft labels so that the graph weights can be joint-optimal for data representation and classification. Using the adaptive weights, we can easily estimate the unknown labels of samples. After that, our method returns the updated weights for further updating the manifold features. Extensive simulations on image classification and segmentation show that our proposed algorithm can deliver the state-of-the-art performance on several public datasets. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yi; Jakeman, John; Gittelson, Claude
2015-01-08
In this paper we present a localized polynomial chaos expansion for partial differential equations (PDE) with random inputs. In particular, we focus on time independent linear stochastic problems with high dimensional random inputs, where the traditional polynomial chaos methods, and most of the existing methods, incur prohibitively high simulation cost. Furthermore, the local polynomial chaos method employs a domain decomposition technique to approximate the stochastic solution locally. In each subdomain, a subdomain problem is solved independently and, more importantly, in a much lower dimensional random space. In a postprocesing stage, accurate samples of the original stochastic problems are obtained frommore » the samples of the local solutions by enforcing the correct stochastic structure of the random inputs and the coupling conditions at the interfaces of the subdomains. Overall, the method is able to solve stochastic PDEs in very large dimensions by solving a collection of low dimensional local problems and can be highly efficient. In our paper we present the general mathematical framework of the methodology and use numerical examples to demonstrate the properties of the method.« less
Joint spatial-spectral hyperspectral image clustering using block-diagonal amplified affinity matrix
NASA Astrophysics Data System (ADS)
Fan, Lei; Messinger, David W.
2018-03-01
The large number of spectral channels in a hyperspectral image (HSI) produces a fine spectral resolution to differentiate between materials in a scene. However, difficult classes that have similar spectral signatures are often confused while merely exploiting information in the spectral domain. Therefore, in addition to spectral characteristics, the spatial relationships inherent in HSIs should also be considered for incorporation into classifiers. The growing availability of high spectral and spatial resolution of remote sensors provides rich information for image clustering. Besides the discriminating power in the rich spectrum, contextual information can be extracted from the spatial domain, such as the size and the shape of the structure to which one pixel belongs. In recent years, spectral clustering has gained popularity compared to other clustering methods due to the difficulty of accurate statistical modeling of data in high dimensional space. The joint spatial-spectral information could be effectively incorporated into the proximity graph for spectral clustering approach, which provides a better data representation by discovering the inherent lower dimensionality from the input space. We embedded both spectral and spatial information into our proposed local density adaptive affinity matrix, which is able to handle multiscale data by automatically selecting the scale of analysis for every pixel according to its neighborhood of the correlated pixels. Furthermore, we explored the "conductivity method," which aims at amplifying the block diagonal structure of the affinity matrix to further improve the performance of spectral clustering on HSI datasets.
On the three-quarter view advantage of familiar object recognition.
Nonose, Kohei; Niimi, Ryosuke; Yokosawa, Kazuhiko
2016-11-01
A three-quarter view, i.e., an oblique view, of familiar objects often leads to a higher subjective goodness rating when compared with other orientations. What is the source of the high goodness for oblique views? First, we confirmed that object recognition performance was also best for oblique views around 30° view, even when the foreshortening disadvantage of front- and side-views was minimized (Experiments 1 and 2). In Experiment 3, we measured subjective ratings of view goodness and two possible determinants of view goodness: familiarity of view, and subjective impression of three-dimensionality. Three-dimensionality was measured as the subjective saliency of visual depth information. The oblique views were rated best, most familiar, and as approximating greatest three-dimensionality on average; however, the cluster analyses showed that the "best" orientation systematically varied among objects. We found three clusters of objects: front-preferred objects, oblique-preferred objects, and side-preferred objects. Interestingly, recognition performance and the three-dimensionality rating were higher for oblique views irrespective of the clusters. It appears that recognition efficiency is not the major source of the three-quarter view advantage. There are multiple determinants and variability among objects. This study suggests that the classical idea that a canonical view has a unique advantage in object perception requires further discussion.
McKim, James M.; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa
2016-01-01
Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose–response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimension-ality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals’ potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. PMID:26046447
Luechtefeld, Thomas; Maertens, Alexandra; McKim, James M; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa
2015-11-01
Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced " false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. Copyright © 2015 John Wiley & Sons, Ltd.
The formation of magnetic silicide Fe3Si clusters during ion implantation
NASA Astrophysics Data System (ADS)
Balakirev, N.; Zhikharev, V.; Gumarov, G.
2014-05-01
A simple two-dimensional model of the formation of magnetic silicide Fe3Si clusters during high-dose Fe ion implantation into silicon has been proposed and the cluster growth process has been computer simulated. The model takes into account the interaction between the cluster magnetization and magnetic moments of Fe atoms random walking in the implanted layer. If the clusters are formed in the presence of the external magnetic field parallel to the implanted layer, the model predicts the elongation of the growing cluster in the field direction. It has been proposed that the cluster elongation results in the uniaxial magnetic anisotropy in the plane of the implanted layer, which is observed in iron silicide films ion-beam synthesized in the external magnetic field.
Somatotyping using 3D anthropometry: a cluster analysis.
Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur
2013-01-01
Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
NASA Astrophysics Data System (ADS)
Evsyukov, N. N.
1984-12-01
An approximate isolation algorithm for the isolation of multidimensional clusters is developed and applied in the construction of a three-dimensional diagram of the optical characteristics of the lunar surface. The method is somewhat analogous to that of Koontz and Fukunaga (1972) and involves isolating two-dimensional clusters, adding a new characteristic, and linearizing, a cycle which is repeated a limited number of times. The lunar-surface parameters analyzed are the 620-nm albedo, the 620/380-nm color index, and the 950/620-nm index. The results are presented graphically; the reliability of the cluster-isolation process is discussed; and some correspondences between known lunar morphology and the cluster maps are indicated.
NASA Astrophysics Data System (ADS)
Roverso, Davide
2003-08-01
Many-class learning is the problem of training a classifier to discriminate among a large number of target classes. Together with the problem of dealing with high-dimensional patterns (i.e. a high-dimensional input space), the many class problem (i.e. a high-dimensional output space) is a major obstacle to be faced when scaling-up classifier systems and algorithms from small pilot applications to large full-scale applications. The Autonomous Recursive Task Decomposition (ARTD) algorithm is here proposed as a solution to the problem of many-class learning. Example applications of ARTD to neural classifier training are also presented. In these examples, improvements in training time are shown to range from 4-fold to more than 30-fold in pattern classification tasks of both static and dynamic character.
Cannistraci, Carlo Vittorio; Ravasi, Timothy; Montevecchi, Franco Maria; Ideker, Trey; Alessio, Massimo
2010-09-15
Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures-specifically dimension reduction (DR), coupled with clustering-provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. https://sites.google.com/site/carlovittoriocannistraci/home.
Self-assembled three-dimensional chiral colloidal architecture.
Ben Zion, Matan Yah; He, Xiaojin; Maass, Corinna C; Sha, Ruojie; Seeman, Nadrian C; Chaikin, Paul M
2017-11-03
Although stereochemistry has been a central focus of the molecular sciences since Pasteur, its province has previously been restricted to the nanometric scale. We have programmed the self-assembly of micron-sized colloidal clusters with structural information stemming from a nanometric arrangement. This was done by combining DNA nanotechnology with colloidal science. Using the functional flexibility of DNA origami in conjunction with the structural rigidity of colloidal particles, we demonstrate the parallel self-assembly of three-dimensional microconstructs, evincing highly specific geometry that includes control over position, dihedral angles, and cluster chirality. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
NASA Technical Reports Server (NTRS)
Chan, S. T. K.; Lee, C. H.; Brashears, M. R.
1975-01-01
A finite element algorithm for solving unsteady, three-dimensional high velocity impact problems is presented. A computer program was developed based on the Eulerian hydroelasto-viscoplastic formulation and the utilization of the theorem of weak solutions. The equations solved consist of conservation of mass, momentum, and energy, equation of state, and appropriate constitutive equations. The solution technique is a time-dependent finite element analysis utilizing three-dimensional isoparametric elements, in conjunction with a generalized two-step time integration scheme. The developed code was demonstrated by solving one-dimensional as well as three-dimensional impact problems for both the inviscid hydrodynamic model and the hydroelasto-viscoplastic model.
Finite-volume application of high order ENO schemes to multi-dimensional boundary-value problems
NASA Technical Reports Server (NTRS)
Casper, Jay; Dorrepaal, J. Mark
1990-01-01
The finite volume approach in developing multi-dimensional, high-order accurate essentially non-oscillatory (ENO) schemes is considered. In particular, a two dimensional extension is proposed for the Euler equation of gas dynamics. This requires a spatial reconstruction operator that attains formal high order of accuracy in two dimensions by taking account of cross gradients. Given a set of cell averages in two spatial variables, polynomial interpolation of a two dimensional primitive function is employed in order to extract high-order pointwise values on cell interfaces. These points are appropriately chosen so that correspondingly high-order flux integrals are obtained through each interface by quadrature, at each point having calculated a flux contribution in an upwind fashion. The solution-in-the-small of Riemann's initial value problem (IVP) that is required for this pointwise flux computation is achieved using Roe's approximate Riemann solver. Issues to be considered in this two dimensional extension include the implementation of boundary conditions and application to general curvilinear coordinates. Results of numerical experiments are presented for qualitative and quantitative examination. These results contain the first successful application of ENO schemes to boundary value problems with solid walls.
A cluster-analytic study of substance problems and mental health among street youths.
Adlaf, E M; Zdanowicz, Y M
1999-11-01
Based on a cluster analysis of 211 street youths aged 13-24 years interviewed in 1992 in Toronto, Ontario, Canada, we describe the configuration of mental health and substance use outcomes. Eight clusters were suggested: Entrepreneurs (n = 19) were frequently involved in delinquent activity and were highly entrenched in the street lifestyle; Drifters (n = 35) had infrequent social contact, displayed lower than average family dysfunction, and were not highly entrenched in the street lifestyle; Partiers (n = 40) were distinguished by their recreational motivation for alcohol and drug use and their below average entrenchment in the street lifestyle; Retreatists (n = 32) were distinguished by their high coping motivation for substance use; Fringers (n = 48) were involved marginally in the street lifestyle and showed lower than average family dysfunction; Transcenders (n = 21), despite above average physical and sexual abuse, reported below average mental health or substance use problems; Vulnerables (n = 12) were characterized by high family dysfunction (including physical and sexual abuse), elevated mental health outcomes, and use of alcohol and other drugs motivated by coping and escapism; Sex Workers (n = 4) were highly entrenched in the street lifestyle and reported frequent commercial sexual work, above average sexual abuse, and extensive use of crack cocaine. The results showed that distress, self-esteem, psychotic thoughts, attempted suicide, alcohol problems, drug problems, dual substance problems, and dual disorders varied significantly among the eight clusters. Overall, the findings suggest the need for differential programming. The data showed that risk factors, mental health, and substance use outcomes vary among this population. Also, for some the web of mental health and substance use problems is inseparable.
NASA Astrophysics Data System (ADS)
Kohno, Masanori
2018-05-01
The single-particle spectral properties of the two-dimensional t-J model with next-nearest-neighbor hopping are investigated near the Mott transition by using cluster perturbation theory. The spectral features are interpreted by considering the effects of the next-nearest-neighbor hopping on the shift of the spectral-weight distribution of the two-dimensional t-J model. Various anomalous features observed in hole-doped and electron-doped high-temperature cuprate superconductors are collectively explained in the two-dimensional t-J model with next-nearest-neighbor hopping near the Mott transition.
Exploring multicollinearity using a random matrix theory approach.
Feher, Kristen; Whelan, James; Müller, Samuel
2012-01-01
Clustering of gene expression data is often done with the latent aim of dimension reduction, by finding groups of genes that have a common response to potentially unknown stimuli. However, what is poorly understood to date is the behaviour of a low dimensional signal embedded in high dimensions. This paper introduces a multicollinear model which is based on random matrix theory results, and shows potential for the characterisation of a gene cluster's correlation matrix. This model projects a one dimensional signal into many dimensions and is based on the spiked covariance model, but rather characterises the behaviour of the corresponding correlation matrix. The eigenspectrum of the correlation matrix is empirically examined by simulation, under the addition of noise to the original signal. The simulation results are then used to propose a dimension estimation procedure of clusters from data. Moreover, the simulation results warn against considering pairwise correlations in isolation, as the model provides a mechanism whereby a pair of genes with `low' correlation may simply be due to the interaction of high dimension and noise. Instead, collective information about all the variables is given by the eigenspectrum.
A cluster analysis investigation of workaholism as a syndrome.
Aziz, Shahnaz; Zickar, Michael J
2006-01-01
Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.
Two-and three-dimensional unsteady lift problems in high-speed flight
NASA Technical Reports Server (NTRS)
Lomax, Harvard; Heaslet, Max A; Fuller, Franklyn B; Sluder, Loma
1952-01-01
The problem of transient lift on two- and three-dimensional wings flying at high speeds is discussed as a boundary-value problem for the classical wave equation. Kirchoff's formula is applied so that the analysis is reduced, just as in the steady state, to an investigation of sources and doublets. The applications include the evaluation of indicial lift and pitching-moment curves for two-dimensional sinking and pitching wings flying at Mach numbers equal to 0, 0.8, 1.0, 1.2 and 2.0. Results for the sinking case are also given for a Mach number of 0.5. In addition, the indicial functions for supersonic-edged triangular wings in both forward and reverse flow are presented and compared with the two-dimensional values.
Shape component analysis: structure-preserving dimension reduction on biological shape spaces.
Lee, Hao-Chih; Liao, Tao; Zhang, Yongjie Jessica; Yang, Ge
2016-03-01
Quantitative shape analysis is required by a wide range of biological studies across diverse scales, ranging from molecules to cells and organisms. In particular, high-throughput and systems-level studies of biological structures and functions have started to produce large volumes of complex high-dimensional shape data. Analysis and understanding of high-dimensional biological shape data require dimension-reduction techniques. We have developed a technique for non-linear dimension reduction of 2D and 3D biological shape representations on their Riemannian spaces. A key feature of this technique is that it preserves distances between different shapes in an embedded low-dimensional shape space. We demonstrate an application of this technique by combining it with non-linear mean-shift clustering on the Riemannian spaces for unsupervised clustering of shapes of cellular organelles and proteins. Source code and data for reproducing results of this article are freely available at https://github.com/ccdlcmu/shape_component_analysis_Matlab The implementation was made in MATLAB and supported on MS Windows, Linux and Mac OS. geyang@andrew.cmu.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Xue, Mianqiang; Zhou, Liang; Kojima, Naoya; Dos Muchangos, Leticia Sarmento; Machimura, Takashi; Tokai, Akihiro
2018-05-01
Increasing manufacture and usage of chemicals have not been matched by the increase in our understanding of their risks. Pollutant release and transfer register (PRTR) is becoming a popular measure for collecting chemical data and enhancing the public right to know. However, these data are usually in high dimensionality which restricts their wider use. The present study partitions Japanese PRTR chemicals into five fuzzy clusters by fuzzy c-mean clustering (FCM) to explore the implicit information. Each chemical with membership degrees belongs to each cluster. Cluster I features high releases from non-listed industries and the household sector and high environmental toxicity. Cluster II is characterized by high reported releases and transfers from 24 listed industries above the threshold, mutagenicity, and high environmental toxicity. Chemicals in cluster III have characteristics of high releases from non-listed industries and low toxicity. Cluster IV is characterized by high reported releases and transfers from 24 listed industries above the threshold and extremely high environmental toxicity. Cluster V is characterized by low releases yet mutagenicity and high carcinogenicity. Chemicals with the highest membership degree were identified as representatives for each cluster. For the highest membership degree, half of the chemicals have a value higher than 0.74. If we look at both the highest and the second highest membership degrees simultaneously, about 94% of the chemicals have a value higher than 0.5. FCM can serve as an approach to uncover the implicit information of highly complex chemical dataset, which subsequently supports the strategy development for efficient and effective chemical management. Copyright © 2017 Elsevier B.V. All rights reserved.
Charge carrier localised in zero-dimensional (CH3NH3)3Bi2I9 clusters.
Ni, Chengsheng; Hedley, Gordon; Payne, Julia; Svrcek, Vladimir; McDonald, Calum; Jagadamma, Lethy Krishnan; Edwards, Paul; Martin, Robert; Jain, Gunisha; Carolan, Darragh; Mariotti, Davide; Maguire, Paul; Samuel, Ifor; Irvine, John
2017-08-01
A metal-organic hybrid perovskite (CH 3 NH 3 PbI 3 ) with three-dimensional framework of metal-halide octahedra has been reported as a low-cost, solution-processable absorber for a thin-film solar cell with a power-conversion efficiency over 20%. Low-dimensional layered perovskites with metal halide slabs separated by the insulating organic layers are reported to show higher stability, but the efficiencies of the solar cells are limited by the confinement of excitons. In order to explore the confinement and transport of excitons in zero-dimensional metal-organic hybrid materials, a highly orientated film of (CH 3 NH 3 ) 3 Bi 2 I 9 with nanometre-sized core clusters of Bi 2 I 9 3- surrounded by insulating CH 3 NH 3 + was prepared via solution processing. The (CH 3 NH 3 ) 3 Bi 2 I 9 film shows highly anisotropic photoluminescence emission and excitation due to the large proportion of localised excitons coupled with delocalised excitons from intercluster energy transfer. The abrupt increase in photoluminescence quantum yield at excitation energy above twice band gap could indicate a quantum cutting due to the low dimensionality.Understanding the confinement and transport of excitons in low dimensional systems will aid the development of next generation photovoltaics. Via photophysical studies Ni et al. observe 'quantum cutting' in 0D metal-organic hybrid materials based on methylammonium bismuth halide (CH 3 NH 3 )3Bi 2 I 9 .
Sturm-Liouville eigenproblems with an interior pole
NASA Technical Reports Server (NTRS)
Boyd, J. P.
1981-01-01
The eigenvalues and eigenfunctions of self-adjoint Sturm-Liouville problems with a simple pole on the interior of an interval are investigated. Three general theorems are proved, and it is shown that as n approaches infinity, the eigenfunctions more and more closely resemble those of an ordinary Sturm-Liouville problem. The low-order modes differ significantly from those of a nonsingular eigenproblem in that both eigenvalues and eigenfunctions are complex, and the eigenvalues for all small n may cluster about a common value in contrast to the widely separated eigenvalues of the corresponding nonsingular problem. In addition, the WKB is shown to be accurate for all n, and all eigenvalues of a normal one-dimensional Sturm-Liouville equation with nonperiodic boundary conditions are well separated.
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
NASA Astrophysics Data System (ADS)
Tian, Ye; Wang, Tong; Liu, Wenyan; Xin, Huolin L.; Li, Huilin; Ke, Yonggang; Shih, William M.; Gang, Oleg
2015-07-01
Three-dimensional mesoscale clusters that are formed from nanoparticles spatially arranged in pre-determined positions can be thought of as mesoscale analogues of molecules. These nanoparticle architectures could offer tailored properties due to collective effects, but developing a general platform for fabricating such clusters is a significant challenge. Here, we report a strategy for assembling three-dimensional nanoparticle clusters that uses a molecular frame designed with encoded vertices for particle placement. The frame is a DNA origami octahedron and can be used to fabricate clusters with various symmetries and particle compositions. Cryo-electron microscopy is used to uncover the structure of the DNA frame and to reveal that the nanoparticles are spatially coordinated in the prescribed manner. We show that the DNA frame and one set of nanoparticles can be used to create nanoclusters with different chiroptical activities. We also show that the octahedra can serve as programmable interparticle linkers, allowing one- and two-dimensional arrays to be assembled with designed particle arrangements.
Information extraction from dynamic PS-InSAR time series using machine learning
NASA Astrophysics Data System (ADS)
van de Kerkhof, B.; Pankratius, V.; Chang, L.; van Swol, R.; Hanssen, R. F.
2017-12-01
Due to the increasing number of SAR satellites, with shorter repeat intervals and higher resolutions, SAR data volumes are exploding. Time series analyses of SAR data, i.e. Persistent Scatterer (PS) InSAR, enable the deformation monitoring of the built environment at an unprecedented scale, with hundreds of scatterers per km2, updated weekly. Potential hazards, e.g. due to failure of aging infrastructure, can be detected at an early stage. Yet, this requires the operational data processing of billions of measurement points, over hundreds of epochs, updating this data set dynamically as new data come in, and testing whether points (start to) behave in an anomalous way. Moreover, the quality of PS-InSAR measurements is ambiguous and heterogeneous, which will yield false positives and false negatives. Such analyses are numerically challenging. Here we extract relevant information from PS-InSAR time series using machine learning algorithms. We cluster (group together) time series with similar behaviour, even though they may not be spatially close, such that the results can be used for further analysis. First we reduce the dimensionality of the dataset in order to be able to cluster the data, since applying clustering techniques on high dimensional datasets often result in unsatisfying results. Our approach is to apply t-distributed Stochastic Neighbor Embedding (t-SNE), a machine learning algorithm for dimensionality reduction of high-dimensional data to a 2D or 3D map, and cluster this result using Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The results show that we are able to detect and cluster time series with similar behaviour, which is the starting point for more extensive analysis into the underlying driving mechanisms. The results of the methods are compared to conventional hypothesis testing as well as a Self-Organising Map (SOM) approach. Hypothesis testing is robust and takes the stochastic nature of the observations into account, but is time consuming. Therefore, we successively apply our machine learning approach with the hypothesis testing approach in order to benefit from both the reduced computation time of the machine learning approach as from the robust quality metrics of hypothesis testing. We acknowledge support from NASA AISTNNX15AG84G (PI V. Pankratius)
NASA Astrophysics Data System (ADS)
Ness, M.; Rix, H.-W.; Hogg, David W.; Casey, A. R.; Holtzman, J.; Fouesneau, M.; Zasowski, G.; Geisler, D.; Shetrone, M.; Minniti, D.; Frinchaboy, Peter M.; Roman-Lopes, Alexandre
2018-02-01
We explore to what extent stars within Galactic disk open clusters resemble each other in the high-dimensional space of their photospheric element abundances and contrast this with pairs of field stars. Our analysis is based on abundances for 20 elements, homogeneously derived from APOGEE spectra (with carefully quantified uncertainties of typically 0.03 dex). We consider 90 red giant stars in seven open clusters and find that most stars within a cluster have abundances in most elements that are indistinguishable (in a {χ }2-sense) from those of the other members, as expected for stellar birth siblings. An analogous analysis among pairs of > 1000 field stars shows that highly significant abundance differences in the 20 dimensional space can be established for the vast majority of these pairs, and that the APOGEE-based abundance measurements have high discriminating power. However, pairs of field stars whose abundances are indistinguishable even at 0.03 dex precision exist: ∼0.3% of all field star pairs and ∼1.0% of field star pairs at the same (solar) metallicity [Fe/H] = 0 ± 0.02. Most of these pairs are presumably not birth siblings from the same cluster, but rather doppelgängers. Our analysis implies that “chemical tagging” in the strict sense, identifying birth siblings for typical disk stars through their abundance similarity alone, will not work with such data. However, our approach shows that abundances have extremely valuable information for probabilistic chemo-orbital modeling, and combined with velocities, we have identified new cluster members from the field.
Computational unsteady aerodynamics for lifting surfaces
NASA Technical Reports Server (NTRS)
Edwards, John W.
1988-01-01
Two dimensional problems are solved using numerical techniques. Navier-Stokes equations are studied both in the vorticity-stream function formulation which appears to be the optimal choice for two dimensional problems, using a storage approach, and in the velocity pressure formulation which minimizes the number of unknowns in three dimensional problems. Analysis shows that compact centered conservative second order schemes for the vorticity equation are the most robust for high Reynolds number flows. Serious difficulties remain in the choice of turbulent models, to keep reasonable CPU efficiency.
NASA Astrophysics Data System (ADS)
Bogiatzis, P.; Ishii, M.; Davis, T. A.
2016-12-01
Seismic tomography inverse problems are among the largest high-dimensional parameter estimation tasks in Earth science. We show how combinatorics and graph theory can be used to analyze the structure of such problems, and to effectively decompose them into smaller ones that can be solved efficiently by means of the least squares method. In combination with recent high performance direct sparse algorithms, this reduction in dimensionality allows for an efficient computation of the model resolution and covariance matrices using limited resources. Furthermore, we show that a new sparse singular value decomposition method can be used to obtain the complete spectrum of the singular values. This procedure provides the means for more objective regularization and further dimensionality reduction of the problem. We apply this methodology to a moderate size, non-linear seismic tomography problem to image the structure of the crust and the upper mantle beneath Japan using local deep earthquakes recorded by the High Sensitivity Seismograph Network stations.
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
PCA based clustering for brain tumor segmentation of T1w MRI images.
Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay
2017-03-01
Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Selvaraj, S.; Gromiha, M. Michael
2003-01-01
Analysis on the three dimensional structures of (α/β)8 barrel proteins provides ample light to understand the factors that are responsible for directing and maintaining their common fold. In this work, the hydrophobically enriched clusters are identified in 92% of the considered (α/β)8 barrel proteins. The residue segments with hydrophobic clusters have high thermal stability. Further, these clusters are formed and stabilized through long-range interactions. Specifically, a network of long-range contacts connects adjacent β-strands of the (α/β)8 barrel domain and the hydrophobic clusters. The implications of hydrophobic clusters and long-range networks in providing a feasible common mechanism for the folding of (α/β)8 barrel proteins are proposed. PMID:12609894
Swarm v2: highly-scalable and high-resolution amplicon clustering
Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah
2015-01-01
Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks. PMID:26713226
NASA Technical Reports Server (NTRS)
Liou, Meng-Sing
1995-01-01
A unique formulation of describing fluid motion is presented. The method, referred to as 'extended Lagrangian method,' is interesting from both theoretical and numerical points of view. The formulation offers accuracy in numerical solution by avoiding numerical diffusion resulting from mixing of fluxes in the Eulerian description. The present method and the Arbitrary Lagrangian-Eulerian (ALE) method have a similarity in spirit-eliminating the cross-streamline numerical diffusion. For this purpose, we suggest a simple grid constraint condition and utilize an accurate discretization procedure. This grid constraint is only applied to the transverse cell face parallel to the local stream velocity, and hence our method for the steady state problems naturally reduces to the streamline-curvature method, without explicitly solving the steady stream-coordinate equations formulated a priori. Unlike the Lagrangian method proposed by Loh and Hui which is valid only for steady supersonic flows, the present method is general and capable of treating subsonic flows and supersonic flows as well as unsteady flows, simply by invoking in the same code an appropriate grid constraint suggested in this paper. The approach is found to be robust and stable. It automatically adapts to flow features without resorting to clustering, thereby maintaining rather uniform grid spacing throughout and large time step. Moreover, the method is shown to resolve multi-dimensional discontinuities with a high level of accuracy, similar to that found in one-dimensional problems.
The degree-related clustering coefficient and its application to link prediction
NASA Astrophysics Data System (ADS)
Liu, Yangyang; Zhao, Chengli; Wang, Xiaojie; Huang, Qiangjuan; Zhang, Xue; Yi, Dongyun
2016-07-01
Link prediction plays a significant role in explaining the evolution of networks. However it is still a challenging problem that has been addressed only with topological information in recent years. Based on the belief that network nodes with a great number of common neighbors are more likely to be connected, many similarity indices have achieved considerable accuracy and efficiency. Motivated by the natural assumption that the effect of missing links on the estimation of a node's clustering ability could be related to node degree, in this paper, we propose a degree-related clustering coefficient index to quantify the clustering ability of nodes. Unlike the classical clustering coefficient, our new coefficient is highly robust when the observed bias of links is considered. Furthermore, we propose a degree-related clustering ability path (DCP) index, which applies the proposed coefficient to the link prediction problem. Experiments on 12 real-world networks show that our proposed method is highly accurate and robust compared with four common-neighbor-based similarity indices (Common Neighbors(CN), Adamic-Adar(AA), Resource Allocation(RA), and Preferential Attachment(PA)), and the recently introduced clustering ability (CA) index.
NASA Astrophysics Data System (ADS)
Zwart, Christine M.; Venkatesan, Ragav; Frakes, David H.
2012-10-01
Interpolation is an essential and broadly employed function of signal processing. Accordingly, considerable development has focused on advancing interpolation algorithms toward optimal accuracy. Such development has motivated a clear shift in the state-of-the art from classical interpolation to more intelligent and resourceful approaches, registration-based interpolation for example. As a natural result, many of the most accurate current algorithms are highly complex, specific, and computationally demanding. However, the diverse hardware destinations for interpolation algorithms present unique constraints that often preclude use of the most accurate available options. For example, while computationally demanding interpolators may be suitable for highly equipped image processing platforms (e.g., computer workstations and clusters), only more efficient interpolators may be practical for less well equipped platforms (e.g., smartphones and tablet computers). The latter examples of consumer electronics present a design tradeoff in this regard: high accuracy interpolation benefits the consumer experience but computing capabilities are limited. It follows that interpolators with favorable combinations of accuracy and efficiency are of great practical value to the consumer electronics industry. We address multidimensional interpolation-based image processing problems that are common to consumer electronic devices through a decomposition approach. The multidimensional problems are first broken down into multiple, independent, one-dimensional (1-D) interpolation steps that are then executed with a newly modified registration-based one-dimensional control grid interpolator. The proposed approach, decomposed multidimensional control grid interpolation (DMCGI), combines the accuracy of registration-based interpolation with the simplicity, flexibility, and computational efficiency of a 1-D interpolation framework. Results demonstrate that DMCGI provides improved interpolation accuracy (and other benefits) in image resizing, color sample demosaicing, and video deinterlacing applications, at a computational cost that is manageable or reduced in comparison to popular alternatives.
Sibling relationship patterns and their associations with child competence and problem behavior.
Buist, Kirsten L; Vermande, Marjolijn
2014-08-01
The present study is the first to examine patterns in sibling relationship quality and the associations of these patterns with internalizing and externalizing problem behavior, as well as self-perceived competence, in middle childhood. Self-report questionnaires (e.g., Sibling Relationship Questionnaire, Self-Perception Profile for Children, Youth Self Report) were administered among 1,670 Dutch children (Mage = 11.40 years, SD = .83) attending 51 different Dutch schools. Three sibling relationship clusters were found: a conflictual cluster (low on warmth, high on conflict), an affect-intense cluster (above average on warmth and conflict), and a harmonious cluster (high on warmth, low on conflict). Sister pairs were underrepresented in the conflictual cluster and overrepresented in the harmonious cluster. Children with conflictual sibling relationships reported significantly more internalizing and externalizing problems, and lower academic and social competence and global self-worth, than children with harmonious sibling relationships. Children with affect-intense sibling relationships reported less aggression and better social competence than children with conflictual sibling relationships. Our findings indicate that it is fruitful to combine indices of sibling warmth and conflict to examine sibling relationship types. Relationship types differed significantly concerning internalizing and externalizing problems, but also concerning self-perceived competence. These findings extend our knowledge about sibling relationship types and their impact on different aspects of child adjustment. Whereas harmonious sibling relationships are the most beneficial for adjustment, sibling conflict mainly has a negative effect on adjustment in combination with lack of sibling warmth. Implications and future directions are discussed.
Stress moderates the relationships between problem-gambling severity and specific psychopathologies.
Ronzitti, Silvia; Kraus, Shane W; Hoff, Rani A; Potenza, Marc N
2018-01-01
The purpose of this study was to examine the extent to which stress moderated the relationships between problem-gambling severity and psychopathologies. We analyzed Wave-1 data from 41,869 participants of the National Epidemiologic Survey of Alcohol and Related Conditions (NESARC). Logistic regression showed that as compared to a non-gambling (NG) group, individuals at-risk gambling (ARG) and problem gambling (PPG) demonstrated higher odds of multiple Axis-I and Axis-II disorders in both high- and low-stress groups. Interactions odds ratios were statistically significant for stress moderating the relationships between at-risk gambling (versus non-gambling) and Any Axis-I and Any Axis-II disorder, with substance-use and Cluster-A and Cluster-B disorders contributing significantly. Some similar patterns were observed for pathological gambling (versus non-gambling), with stress moderating relationships with Cluster-B disorders. In all cases, a stronger relationship was observed between problem-gambling severity and psychopathology in the low-stress versus high-stress groups. The findings suggest that perceived stress accounts for some of the variance in the relationship between problem-gambling severity and specific forms of psychopathology, particularly with respect to lower intensity, subsyndromal levels of gambling. Findings suggest that stress may be particularly important to consider in the relationships between problem-gambling severity and substance use and Cluster-B disorders. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
Anokhina, Ekaterina V.
Low-dimensional and open-framework materials containing transition metals have a wide range of applications in redox catalysis, solid-state batteries, and electronic and magnetic devices. This dissertation reports on research carried out with the goal to develop a strategy for the preparation of low-dimensional and open-framework materials using octahedral metal clusters as building blocks. Our approach takes its roots from crystal engineering principles where the desired framework topologies are achieved through building block design. The key idea of this work is to induce directional bonding preferences in the cluster units using a combination of ligands with a large difference in charge density. This investigation led to the preparation and characterization of a new family of niobium oxychloride cluster compounds with original structure types exhibiting 1ow-dimensional or open-framework character. Most of these materials have framework topologies unprecedented in compounds containing octahedral clusters. Comparative analysis of their structural features indicates that the novel cluster connectivity patterns in these systems are the result of complex interplay between the effects of anisotropic ligand arrangement in the cluster unit and optimization of ligand-counterion electrostatic interactions. The important role played by these factors sets niobium oxychloride systems apart from cluster compounds with one ligand type or statistical ligand distribution where the main structure-determining factor is the total number of ligands. These results provide a blueprint for expanding the ligand combination strategy to other transition metal cluster systems and for the future rational design of cluster-based materials.
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis
2015-01-01
ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
High-dimensional vector semantics
NASA Astrophysics Data System (ADS)
Andrecut, M.
In this paper we explore the “vector semantics” problem from the perspective of “almost orthogonal” property of high-dimensional random vectors. We show that this intriguing property can be used to “memorize” random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering.
On the calculation of dynamic and heat loads on a three-dimensional body in a hypersonic flow
NASA Astrophysics Data System (ADS)
Bocharov, A. N.; Bityurin, V. A.; Evstigneev, N. M.; Fortov, V. E.; Golovin, N. N.; Petrovskiy, V. P.; Ryabkov, O. I.; Teplyakov, I. O.; Shustov, A. A.; Solomonov, Yu S.
2018-01-01
We consider a three-dimensional body in a hypersonic flow at zero angle of attack. Our aim is to estimate heat and aerodynamic loads on specific body elements. We are considering a previously developed code to solve coupled heat- and mass-transfer problem. The change of the surface shape is taken into account by formation of the iterative process for the wall material ablation. The solution is conducted on the multi-graphics-processing-unit (multi-GPU) cluster. Five Mach number points are considered, namely for M = 20-28. For each point we estimate body shape after surface ablation, heat loads on the surface and aerodynamic loads on the whole body and its elements. The latter is done using Gauss-type quadrature on the surface of the body. The comparison of the results for different Mach numbers is performed. We also estimate the efficiency of the Navier-Stokes code on multi-GPU and central processing unit architecture for the coupled heat and mass transfer problem.
A three-dimensional structured/unstructured hybrid Navier-Stokes method for turbine blade rows
NASA Technical Reports Server (NTRS)
Tsung, F.-L.; Loellbach, J.; Kwon, O.; Hah, C.
1994-01-01
A three-dimensional viscous structured/unstructured hybrid scheme has been developed for numerical computation of high Reynolds number turbomachinery flows. The procedure allows an efficient structured solver to be employed in the densely clustered, high aspect-ratio grid around the viscous regions near solid surfaces, while employing an unstructured solver elsewhere in the flow domain to add flexibility in mesh generation. Test results for an inviscid flow over an external transonic wing and a Navier-Stokes flow for an internal annular cascade are presented.
Blöchliger, Nicolas; Caflisch, Amedeo; Vitalis, Andreas
2015-11-10
Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.
Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian
2017-01-01
The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916
Zhang, Yuchi; Wu, Yuanhua; He, Xin; Ma, Junhan; Shen, Xuan; Zhu, Dunru
2018-03-01
Using polynuclear metal clusters as nodes, many high-symmetry high-connectivity nets, like 8-connnected bcu and 12-connected fcu, have been attained in metal-organic frameworks (MOFs). However, construction of low-symmetry high-connected MOFs with a novel topology still remains a big challenge. For example, a uninodal 8-connected lsz network, observed in inorganic ZrSiO 4 , has not been topologically identified in MOFs. Using 2,2'-difluorobiphenyl-4,4'-dicarboxylic acid (H 2 L) as a new linker and 1,2,4-triazole (Htrz) as a coligand, a novel three-dimensional Cd II -MOF, namely poly[tetrakis(μ 4 -2,2'-difluorobiphenyl-4,4'-dicarboxylato-κ 5 O 1 ,O 1' :O 1' :O 4 :O 4' )tetrakis(N,N-dimethylformamide-κO)tetrakis(μ 3 -1,2,4-triazolato-κ 3 N 1 :N 2 :N 4 )hexacadmium(II)], [Cd 6 (C 14 H 6 F 2 O 4 ) 4 (C 2 H 2 N 3 ) 4 (C 3 H 7 NO) 4 ] n , (I), has been prepared. Single-crystal structure analysis indicates that six different Cd II ions co-exist in (I) and each Cd II ion displays a distorted [CdO 4 N 2 ] octahedral geometry with four equatorial O atoms and two axial N atoms. Three Cd II ions are connected by four carboxylate groups and four trz - ligands to form a linear trinuclear [Cd 3 (COO) 4 (trz) 4 ] cluster, as do the other three Cd II ions. Two Cd 3 clusters are linked by trz - ligands in a μ 1,2,4 -bridging mode to produce a two-dimensional Cd II -triazolate layer with (6,3) topology in the ab plane. These two-dimensional layers are further pillared by the L 2- ligands along the c axis to generate a complicated three-dimensional framework. Topologically, regarding the Cd 3 cluster as an 8-connected node, the whole architecture of (I) is a uninodal 8-connected lsz framework with the Schläfli symbol (4 22 ·6 6 ). Complex (I) was further characterized by elemental analysis, IR spectroscopy, powder X-ray diffraction, thermogravimetric analysis and a photoluminescence study. MOF (I) has a high thermal and water stability.
A variational Bayes spatiotemporal model for electromagnetic brain mapping.
Nathoo, F S; Babul, A; Moiseev, A; Virji-Babul, N; Beg, M F
2014-03-01
In this article, we present a new variational Bayes approach for solving the neuroelectromagnetic inverse problem arising in studies involving electroencephalography (EEG) and magnetoencephalography (MEG). This high-dimensional spatiotemporal estimation problem involves the recovery of time-varying neural activity at a large number of locations within the brain, from electromagnetic signals recorded at a relatively small number of external locations on or near the scalp. Framing this problem within the context of spatial variable selection for an underdetermined functional linear model, we propose a spatial mixture formulation where the profile of electrical activity within the brain is represented through location-specific spike-and-slab priors based on a spatial logistic specification. The prior specification accommodates spatial clustering in brain activation, while also allowing for the inclusion of auxiliary information derived from alternative imaging modalities, such as functional magnetic resonance imaging (fMRI). We develop a variational Bayes approach for computing estimates of neural source activity, and incorporate a nonparametric bootstrap for interval estimation. The proposed methodology is compared with several alternative approaches through simulation studies, and is applied to the analysis of a multimodal neuroimaging study examining the neural response to face perception using EEG, MEG, and fMRI. © 2013, The International Biometric Society.
CELFE/NASTRAN Code for the Analysis of Structures Subjected to High Velocity Impact
NASA Technical Reports Server (NTRS)
Chamis, C. C.
1978-01-01
CELFE (Coupled Eulerian Lagrangian Finite Element)/NASTRAN Code three-dimensional finite element code has the capability for analyzing of structures subjected to high velocity impact. The local response is predicted by CELFE and, for large problems, the far-field impact response is predicted by NASTRAN. The coupling of the CELFE code with NASTRAN (CELFE/NASTRAN code) and the application of the code to selected three-dimensional high velocity impact problems are described.
Reliability enhancement of Navier-Stokes codes through convergence acceleration
NASA Technical Reports Server (NTRS)
Merkle, Charles L.; Dulikravich, George S.
1995-01-01
Methods for enhancing the reliability of Navier-Stokes computer codes through improving convergence characteristics are presented. The improving of these characteristics decreases the likelihood of code unreliability and user interventions in a design environment. The problem referred to as a 'stiffness' in the governing equations for propulsion-related flowfields is investigated, particularly in regard to common sources of equation stiffness that lead to convergence degradation of CFD algorithms. Von Neumann stability theory is employed as a tool to study the convergence difficulties involved. Based on the stability results, improved algorithms are devised to ensure efficient convergence in different situations. A number of test cases are considered to confirm a correlation between stability theory and numerical convergence. The examples of turbulent and reacting flow are presented, and a generalized form of the preconditioning matrix is derived to handle these problems, i.e., the problems involving additional differential equations for describing the transport of turbulent kinetic energy, dissipation rate and chemical species. Algorithms for unsteady computations are considered. The extension of the preconditioning techniques and algorithms derived for Navier-Stokes computations to three-dimensional flow problems is discussed. New methods to accelerate the convergence of iterative schemes for the numerical integration of systems of partial differential equtions are developed, with a special emphasis on the acceleration of convergence on highly clustered grids.
An Evaluation of Database Solutions to Spatial Object Association
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2008-06-24
Object association is a common problem encountered in many applications. Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two datasets based on their positions in a common spatial coordinate system--one of the datasets may correspond to a catalog of objects observed over time in a multi-dimensional domain; the other dataset may consist of objects observed in a snapshot of the domain at a time point. The use of database management systems to the solve the object association problem provides portability across different platforms and also greater flexibility. Increasingmore » dataset sizes in today's applications, however, have made object association a data/compute-intensive problem that requires targeted optimizations for efficient execution. In this work, we investigate how database-based crossmatch algorithms can be deployed on different database system architectures and evaluate the deployments to understand the impact of architectural choices on crossmatch performance and associated trade-offs. We investigate the execution of two crossmatch algorithms on (1) a parallel database system with active disk style processing capabilities, (2) a high-throughput network database (MySQL Cluster), and (3) shared-nothing databases with replication. We have conducted our study in the context of a large-scale astronomy application with real use-case scenarios.« less
Nguyen, Huyen T.; Jia, Guang; Shah, Zarine K.; Pohar, Kamal; Mortazavi, Amir; Zynger, Debra L.; Wei, Lai; Yang, Xiangyu; Clark, Daniel; Knopp, Michael V.
2015-01-01
Purpose To apply k-means clustering of two pharmacokinetic parameters derived from 3T DCE-MRI to predict chemotherapeutic response in bladder cancer at the mid-cycle time-point. Materials and Methods With the pre-determined number of 3 clusters, k-means clustering was performed on non-dimensionalized Amp and kep estimates of each bladder tumor. Three cluster volume fractions (VFs) were calculated for each tumor at baseline and mid-cycle. The changes of three cluster VFs from baseline to mid-cycle were correlated with the tumor’s chemotherapeutic response. Receiver-operating-characteristics curve analysis was used to evaluate the performance of each cluster VF change as a biomarker of chemotherapeutic response in bladder cancer. Results k-means clustering partitioned each bladder tumor into cluster 1 (low kep and low Amp), cluster 2 (low kep and high Amp), cluster 3 (high kep and low Amp). The changes of all three cluster VFs were found to be associated with bladder tumor response to chemotherapy. The VF change of cluster 2 presented with the highest area-under-the-curve value (0.96) and the highest sensitivity/specificity/accuracy (96%/100%/97%) with a selected cutoff value. Conclusion k-means clustering of the two DCE-MRI pharmacokinetic parameters can characterize the complex microcirculatory changes within a bladder tumor to enable early prediction of the tumor’s chemotherapeutic response. PMID:24943272
Dark matter and cosmological nucleosynthesis
NASA Technical Reports Server (NTRS)
Schramm, D. N.
1986-01-01
Existing dark matter problems, i.e., dynamics, galaxy formation and inflation, are considered, along with a model which proposes dark baryons as the bulk of missing matter in a fractal universe. It is shown that no combination of dark, nonbaryonic matter can either provide a cosmological density parameter value near unity or, as in the case of high energy neutrinos, allow formation of condensed matter at epochs when quasars already existed. The possibility that correlations among galactic clusters are scale-free is discussed. Such a distribution of matter would yield a fractal of 1.2, close to a one-dimensional universe. Biasing, cosmic superstrings, and percolated explosions and hot dark matter are theoretical approaches that would satisfy the D = 1.2 fractal model of the large-scale structure of the universe and which would also allow sufficient dark matter in halos to close the universe.
The Intrinsic Ferromagnetism in a MnO2 Monolayer.
Kan, M; Zhou, J; Sun, Q; Kawazoe, Y; Jena, P
2013-10-17
The Mn atom, because of its special electronic configuration of 3d(5)4s(2), has been widely used as a dopant in various two-dimensional (2D) monolayers such as graphene, BN, silicene and transition metal dichalcogenides (TMDs). The distributions of doped Mn atoms in these systems are highly sensitive to the synthesis process and conditions, thus suffering from problems of low solubility and surface clustering. Here we show for the first time that the MnO2 monolayer, synthetized 10 years ago, where Mn ions are individually held at specific sites, exhibits intrinsic ferromagnetism with a Curie temperature of 140 K, comparable to the highest TC value achieved experimentally for Mn-doped GaAs. The well-defined atomic configuration and the intrinsic ferromagnetism of the MnO2 monolayer suggest that it is superior to other magnetic monolayer materials.
Three-dimensional particle tracking velocimetry algorithm based on tetrahedron vote
NASA Astrophysics Data System (ADS)
Cui, Yutong; Zhang, Yang; Jia, Pan; Wang, Yuan; Huang, Jingcong; Cui, Junlei; Lai, Wing T.
2018-02-01
A particle tracking velocimetry algorithm based on tetrahedron vote, which is named TV-PTV, is proposed to overcome the limited selection problem of effective algorithms for 3D flow visualisation. In this new cluster-matching algorithm, tetrahedrons produced by the Delaunay tessellation are used as the basic units for inter-frame matching, which results in a simple algorithmic structure of only two independent preset parameters. Test results obtained using the synthetic test image data from the Visualisation Society of Japan show that TV-PTV presents accuracy comparable to that of the classical algorithm based on new relaxation method (NRX). Compared with NRX, TV-PTV possesses a smaller number of loops in programming and thus a shorter computing time, especially for large particle displacements and high particle concentration. TV-PTV is confirmed practically effective using an actual 3D wake flow.
Gritsenko, Valeriya; Hardesty, Russell L; Boots, Mathew T; Yakovenko, Sergiy
2016-01-01
Neural control of movement can only be realized though the interaction between the mechanical properties of the limb and the environment. Thus, a fundamental question is whether anatomy has evolved to simplify neural control by shaping these interactions in a beneficial way. This inductive data-driven study analyzed the patterns of muscle actions across multiple joints using the musculoskeletal model of the human upper limb. This model was used to calculate muscle lengths across the full range of motion of the arm and examined the correlations between these values between all pairs of muscles. Musculoskeletal coupling was quantified using hierarchical clustering analysis. Muscle lengths between multiple pairs of muscles across multiple postures were highly correlated. These correlations broadly formed two proximal and distal groups, where proximal muscles of the arm were correlated with each other and distal muscles of the arm and hand were correlated with each other, but not between groups. Using hierarchical clustering, between 11 and 14 reliable muscle groups were identified. This shows that musculoskeletal anatomy does indeed shape the mechanical interactions by grouping muscles into functional clusters that generally match the functional repertoire of the human arm. Together, these results support the idea that the structure of the musculoskeletal system is tuned to solve movement complexity problem by reducing the dimensionality of available solutions.
NASA Astrophysics Data System (ADS)
Liu, Changying; Wu, Xinyuan
2017-07-01
In this paper we explore arbitrarily high-order Lagrange collocation-type time-stepping schemes for effectively solving high-dimensional nonlinear Klein-Gordon equations with different boundary conditions. We begin with one-dimensional periodic boundary problems and first formulate an abstract ordinary differential equation (ODE) on a suitable infinity-dimensional function space based on the operator spectrum theory. We then introduce an operator-variation-of-constants formula which is essential for the derivation of our arbitrarily high-order Lagrange collocation-type time-stepping schemes for the nonlinear abstract ODE. The nonlinear stability and convergence are rigorously analysed once the spatial differential operator is approximated by an appropriate positive semi-definite matrix under some suitable smoothness assumptions. With regard to the two dimensional Dirichlet or Neumann boundary problems, our new time-stepping schemes coupled with discrete Fast Sine / Cosine Transformation can be applied to simulate the two-dimensional nonlinear Klein-Gordon equations effectively. All essential features of the methodology are present in one-dimensional and two-dimensional cases, although the schemes to be analysed lend themselves with equal to higher-dimensional case. The numerical simulation is implemented and the numerical results clearly demonstrate the advantage and effectiveness of our new schemes in comparison with the existing numerical methods for solving nonlinear Klein-Gordon equations in the literature.
Cluster redshifts in five suspected superclusters
NASA Technical Reports Server (NTRS)
Ciardullo, R.; Ford, H.; Harms, R.
1985-01-01
Redshift surveys for rich superclusters were carried out in five regions of the sky containing surface-density enhancements of Abell clusters. While several superclusters are identified, projection effects dominate each field, and no system contains more than five rich clusters. Two systems are found to be especially interesting. The first, field 0136 10, is shown to contain a superposition of at least four distinct superclusters, with the richest system possessing a small velocity dispersion. The second system, 2206 - 22, though a region of exceedingly high Abell cluster surface density, appears to be a remarkable superposition of 23 rich clusters almost uniformly distributed in redshift space between 0.08 and 0.24. The new redshifts significantly increase the three-dimensional information available for the distance class 5 and 6 Abell clusters and allow the spatial correlation function around rich superclusters to be estimated.
Thermodynamics of confined gallium clusters.
Chandrachud, Prachi
2015-11-11
We report the results of ab initio molecular dynamics simulations of Ga13 and Ga17 clusters confined inside carbon nanotubes with different diameters. The cluster-tube interaction is simulated by the Lennard-Jones (LJ) potential. We discuss the geometries, the nature of the bonding and the thermodynamics under confinement. The geometries as well as the isomer spectra of both the clusters are significantly affected. The degree of confinement decides the dimensionality of the clusters. We observe that a number of low-energy isomers appear under moderate confinement while some isomers seen in the free space disappear. Our finite-temperature simulations bring out interesting aspects, namely that the heat capacity curve is flat, even though the ground state is symmetric. Such a flat nature indicates that the phase change is continuous. This effect is due to the restricted phase space available to the system. These observations are supported by the mean square displacement of individual atoms, which are significantly smaller than in free space. The nature of the bonding is found to be approximately jellium-like. Finally we note the relevance of the work to the problem of single file diffusion for the case of the highest confinement.
NASA Astrophysics Data System (ADS)
Lyubushin, Alexey
2016-04-01
The problem of estimate of current seismic danger based on monitoring of seismic noise properties from broadband seismic network F-net in Japan (84 stations) is considered. Variations of the following seismic noise parameters are analyzed: multifractal singularity spectrum support width, generalized Hurst exponent, minimum Hölder-Lipschitz exponent and minimum normalized entropy of squared orthogonal wavelet coefficients. These parameters are estimated within adjacent time windows of the length 1 day for seismic noise waveforms from each station. Calculating daily median values of these parameters by all stations provides 4-dimensional time series which describes integral properties of the seismic noise in the region covered by the network. Cluster analysis is applied to the sequence of clouds of 4-dimensional vectors within moving time window of the length 365 days with mutual shift 3 days starting from the beginning of 1997 up to the current time. The purpose of the cluster analysis is to find the best number of clusters (BNC) from probe numbers which are varying from 1 up to the maximum value 40. The BNC is found from the maximum of pseudo-F-statistics (PFS). A 2D map could be created which presents dependence of PFS on the tested probe number of clusters and the right-hand end of moving time window which is rather similar to usual spectral time-frequency diagrams. In the paper [1] it was shown that the BNC before Tohoku mega-earthquake on March 11, 2011, has strongly chaotic regime with jumps from minimum up to maximum values in the time interval 1 year before the event and this time intervals was characterized by high PFS values. The PFS-map is proposed as the method for extracting time intervals with high current seismic danger. The next danger time interval after Tohoku mega-EQ began at the end of 2012 and was finished at the middle of 2013. Starting from middle of 2015 the high PFS values and chaotic regime of BNC variations were returned. This could be interpreted as the increasing of the danger of the next mega-EQ in Japan in the region of Nankai Trough [1] at the first half of 2016. References 1. Lyubushin, A., 2013. How soon would the next mega-earthquake occur in Japan? // Natural Science, 5 (8A1), 1-7. http://dx.doi.org/10.4236/ns.2013.58A1001
Two-dimensional and three-dimensional Coulomb clusters in parabolic traps
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'yachkov, L. G., E-mail: dyachk@mail.ru; Myasnikov, M. I., E-mail: miasnikovmi@mail.ru; Petrov, O. F.
2014-09-15
We consider the shell structure of Coulomb clusters in an axially symmetric parabolic trap exhibiting a confining potential U{sub c}(ρ,z)=(mω{sup 2}/2)(ρ{sup 2}+αz{sup 2}). Assuming an anisotropic parameter α = 4 (corresponding to experiments employing a cusp magnetic trap under microgravity conditions), we have calculated cluster configurations for particle numbers N = 3 to 30. We have shown that clusters with N ≤ 12 initially remain flat, transitioning to three-dimensional configurations as N increases. For N = 8, we have calculated the configurations of minimal potential energy for all values of α and found the points of configuration transitions. For N = 13 and 23, we discuss the influence of bothmore » the shielding and anisotropic parameter on potential energy, cluster size, and shell structure.« less
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
ICM: a web server for integrated clustering of multi-dimensional biomedical data.
He, Song; He, Haochen; Xu, Wenjian; Huang, Xin; Jiang, Shuai; Li, Fei; He, Fuchu; Bo, Xiaochen
2016-07-08
Large-scale efforts for parallel acquisition of multi-omics profiling continue to generate extensive amounts of multi-dimensional biomedical data. Thus, integrated clustering of multiple types of omics data is essential for developing individual-based treatments and precision medicine. However, while rapid progress has been made, methods for integrated clustering are lacking an intuitive web interface that facilitates the biomedical researchers without sufficient programming skills. Here, we present a web tool, named Integrated Clustering of Multi-dimensional biomedical data (ICM), that provides an interface from which to fuse, cluster and visualize multi-dimensional biomedical data and knowledge. With ICM, users can explore the heterogeneity of a disease or a biological process by identifying subgroups of patients. The results obtained can then be interactively modified by using an intuitive user interface. Researchers can also exchange the results from ICM with collaborators via a web link containing a Project ID number that will directly pull up the analysis results being shared. ICM also support incremental clustering that allows users to add new sample data into the data of a previous study to obtain a clustering result. Currently, the ICM web server is available with no login requirement and at no cost at http://biotech.bmi.ac.cn/icm/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.
Nakano, Takashi; Otsuka, Makoto; Yoshimoto, Junichiro; Doya, Kenji
2015-01-01
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. PMID:25734662
Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.
Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin
2018-05-03
Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.
"Divide-and-conquer" semiclassical molecular dynamics: An application to water clusters
NASA Astrophysics Data System (ADS)
Di Liberto, Giovanni; Conte, Riccardo; Ceotto, Michele
2018-03-01
We present an investigation of vibrational features in water clusters performed by means of our recently established divide-and-conquer semiclassical approach [M. Ceotto, G. Di Liberto, and R. Conte, Phys. Rev. Lett. 119, 010401 (2017)]. This technique allows us to simulate quantum vibrational spectra of high-dimensional systems starting from full-dimensional classical trajectories and projection of the semiclassical propagator onto a set of lower dimensional subspaces. The potential energy surface employed is a many-body representation up to three-body terms, in which monomers and two-body interactions are described by the high level Wang-Huang-Braams-Bowman (WHBB) water potential, while, for three-body interactions, calculations adopt a fast permutationally invariant ab initio surface at the same level of theory of the WHBB 3-body potential. Applications range from the water dimer up to the water decamer, a system made of 84 vibrational degrees of freedom. Results are generally in agreement with previous variational estimates in the literature. This is particularly true for the bending and the high-frequency stretching motions, while estimates of modes strongly influenced by hydrogen bonding are red shifted, in a few instances even substantially, as a consequence of the dynamical and global picture provided by the semiclassical approach.
The void spectrum in two-dimensional numerical simulations of gravitational clustering
NASA Technical Reports Server (NTRS)
Kauffmann, Guinevere; Melott, Adrian L.
1992-01-01
An algorithm for deriving a spectrum of void sizes from two-dimensional high-resolution numerical simulations of gravitational clustering is tested, and it is verified that it produces the correct results where those results can be anticipated. The method is used to study the growth of voids as clustering proceeds. It is found that the most stable indicator of the characteristic void 'size' in the simulations is the mean fractional area covered by voids of diameter d, in a density field smoothed at its correlation length. Very accurate scaling behavior is found in power-law numerical models as they evolve. Eventually, this scaling breaks down as the nonlinearity reaches larger scales. It is shown that this breakdown is a manifestation of the undesirable effect of boundary conditions on simulations, even with the very large dynamic range possible here. A simple criterion is suggested for deciding when simulations with modest large-scale power may systematically underestimate the frequency of larger voids.
Tillfors, Maria; Furmark, Tomas; Carlbring, Per; Andersson, Gerhard
2015-06-01
In social anxiety disorder (SAD) co-morbid depressive symptoms as well as avoidance behaviors have been shown to predict insufficient treatment response. It is likely that subgroups of individuals with different profiles of risk factors for poor treatment response exist. This study aimed to identify subgroups of social avoidance and depressive symptoms in a clinical sample (N = 167) with SAD before and after guided internet-delivered CBT, and to compare these groups on diagnostic status and social anxiety. We further examined individual movement between subgroups over time. Using cluster analysis we identified four subgroups, including a high-problem cluster at both time-points. Individuals in this cluster showed less remission after treatment, exhibited higher levels of social anxiety at both assessments, and typically remained in the high-problem cluster after treatment. Thus, in individuals with SAD, high levels of social avoidance and depressive symptoms constitute a risk profile for poor treatment response. Copyright © 2015 Elsevier Ltd. All rights reserved.
Portuguese Lexical Clusters and CVC Sequences in Speech Perception and Production.
Cunha, Conceição
2015-01-01
This paper investigates similarities between lexical consonant clusters and CVC sequences differing in the presence or absence of a lexical vowel in speech perception and production in two Portuguese varieties. The frequent high vowel deletion in the European variety (EP) and the realization of intervening vocalic elements between lexical clusters in Brazilian Portuguese (BP) may minimize the contrast between lexical clusters and CVC sequences in the two Portuguese varieties. In order to test this hypothesis we present a perception experiment with 72 participants and a physiological analysis of 3-dimensional movement data from 5 EP and 4 BP speakers. The perceptual results confirmed a gradual confusion of lexical clusters and CVC sequences in EP, which corresponded roughly to the gradient consonantal overlap found in production. © 2015 S. Karger AG, Basel.
Kinetics of binary nucleation of vapors in size and composition space.
Fisenko, Sergey P; Wilemski, Gerald
2004-11-01
We reformulate the kinetic description of binary nucleation in the gas phase using two natural independent variables: the total number of molecules g and the molar composition x of the cluster. The resulting kinetic equation can be viewed as a two-dimensional Fokker-Planck equation describing the simultaneous Brownian motion of the clusters in size and composition space. Explicit expressions for the Brownian diffusion coefficients in cluster size and composition space are obtained. For characterization of binary nucleation in gases three criteria are established. These criteria establish the relative importance of the rate processes in cluster size and composition space for different gas phase conditions and types of liquid mixtures. The equilibrium distribution function of the clusters is determined in terms of the variables g and x. We obtain an approximate analytical solution for the steady-state binary nucleation rate that has the correct limit in the transition to unary nucleation. To further illustrate our description, the nonequilibrium steady-state cluster concentrations are found by numerically solving the reformulated kinetic equation. For the reformulated transient problem, the relaxation or induction time for binary nucleation was calculated using Galerkin's method. This relaxation time is affected by processes in both size and composition space, but the contributions from each process can be separated only approximately.
Repair of clustered DNA damage caused by high LET radiation in human fibroblasts
NASA Technical Reports Server (NTRS)
Rydberg, B.; Lobrich, M.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)
1998-01-01
It has recently been demonstrated experimentally that DNA damage induced by high LET radiation in mammalian cells is non-randomly distributed along the DNA molecule in the form of clusters of various sizes. The sizes of such clusters range from a few base-pairs to at least 200 kilobase-pairs. The high biological efficiency of high LET radiation for induction of relevant biological endpoints is probably a consequence of this clustering, although the exact mechanisms by which the clustering affects the biological outcome is not known. We discuss here results for induction and repair of base damage, single-strand breaks and double-strand breaks for low and high LET radiations. These results are discussed in the context of clustering. Of particular interest is to determine how clustering at different scales affects overall rejoining and fidelity of rejoining of DNA double-strand breaks. However, existing methods for measuring repair of DNA strand breaks are unable to resolve breaks that are close together in a cluster. This causes problems in interpretation of current results from high LET radiation and will require new methods to be developed.
Radio jet propagation and wide-angle tailed radio sources in merging galaxy cluster environments
NASA Technical Reports Server (NTRS)
Loken, Chris; Roettiger, Kurt; Burns, Jack O.; Norman, Michael
1995-01-01
The intracluster medium (ICM) within merging clusters of galaxies is likely to be in a violent or turbulent dynamical state which may have a significant effect on the evolution of cluster radio sources. We present results from a recent gas + N-body simulation of a cluster merger, suggesting that mergers can result in long-lived, supersonic bulk flows, as well as shocks, within a few hundred kiloparsecs of the core of the dominant cluster. These results have motivated our new two-dimensional and three-dimensional simulations of jet propagation in such environments. The first set of simulations models the ISM/ICM transition as a contact discontinuity with a strong velocity shear. A supersonic (M(sub j) = 6) jet crossing this discontinuity into an ICM with a transverse, supersonic wind bends continuously, becomes 'naked' on the upwind side, and forms a distended cocoon on the downwind side. In the case of a mildly supersonic jet (M(sub j) = 3), however, a shock is driven into the ISM and ISM material is pulled along with the jet into the ICM. Instabilities excited at the ISM/ICM interface result in the jet repeatedly pinching off and reestablishing itself in a series of 'disconnection events.' The second set of simulations deals with a jet encountering a shock in the merging cluster environment. A series of relatively high-resolution two-dimensional calculations is used to confirm earlier analysis predicting that the jet will not disrupt when the jet Mach number is greater than the shock Mach number. A jet which survives the encounter with the shock will decrease in radius and disrupt shortly thereafter as a result of the growth of Kelvin-Helmholtz instabilities. We also find, in disagreement with predictions, that the jet flaring angle decreases with increasing jet density. Finally, a three-dimensional simulation of a jet crossing an oblique shock gives rise to a morphology which resembles a wide-angle tailed radio source with the jet flaring at the shock and disrupting to form a long, turbulent tail which is dragged downstream by the preshock wind.
Kinetic energy distribution of multiply charged ions in Coulomb explosion of Xe clusters.
Heidenreich, Andreas; Jortner, Joshua
2011-02-21
We report on the calculations of kinetic energy distribution (KED) functions of multiply charged, high-energy ions in Coulomb explosion (CE) of an assembly of elemental Xe(n) clusters (average size (n) = 200-2171) driven by ultra-intense, near-infrared, Gaussian laser fields (peak intensities 10(15) - 4 × 10(16) W cm(-2), pulse lengths 65-230 fs). In this cluster size and pulse parameter domain, outer ionization is incomplete∕vertical, incomplete∕nonvertical, or complete∕nonvertical, with CE occurring in the presence of nanoplasma electrons. The KEDs were obtained from double averaging of single-trajectory molecular dynamics simulation ion kinetic energies. The KEDs were doubly averaged over a log-normal cluster size distribution and over the laser intensity distribution of a spatial Gaussian beam, which constitutes either a two-dimensional (2D) or a three-dimensional (3D) profile, with the 3D profile (when the cluster beam radius is larger than the Rayleigh length) usually being experimentally realized. The general features of the doubly averaged KEDs manifest the smearing out of the structure corresponding to the distribution of ion charges, a marked increase of the KEDs at very low energies due to the contribution from the persistent nanoplasma, a distortion of the KEDs and of the average energies toward lower energy values, and the appearance of long low-intensity high-energy tails caused by the admixture of contributions from large clusters by size averaging. The doubly averaged simulation results account reasonably well (within 30%) for the experimental data for the cluster-size dependence of the CE energetics and for its dependence on the laser pulse parameters, as well as for the anisotropy in the angular distribution of the energies of the Xe(q+) ions. Possible applications of this computational study include a control of the ion kinetic energies by the choice of the laser intensity profile (2D∕3D) in the laser-cluster interaction volume.
Quantum states and optical responses of low-dimensional electron hole systems
NASA Astrophysics Data System (ADS)
Ogawa, Tetsuo
2004-09-01
Quantum states and their optical responses of low-dimensional electron-hole systems in photoexcited semiconductors and/or metals are reviewed from a theoretical viewpoint, stressing the electron-hole Coulomb interaction, the excitonic effects, the Fermi-surface effects and the dimensionality. Recent progress of theoretical studies is stressed and important problems to be solved are introduced. We cover not only single-exciton problems but also few-exciton and many-exciton problems, including electron-hole plasma situations. Dimensionality of the Wannier exciton is clarified in terms of its linear and nonlinear responses. We also discuss a biexciton system, exciton bosonization technique, high-density degenerate electron-hole systems, gas-liquid phase separation in an excited state and the Fermi-edge singularity due to a Mahan exciton in a low-dimensional metal.
Optimum Particle Size for Gold-Catalyzed CO Oxidation
2018-01-01
The structure sensitivity of gold-catalyzed CO oxidation is presented by analyzing in detail the dependence of CO oxidation rate on particle size. Clusters with less than 14 gold atoms adopt a planar structure, whereas larger ones adopt a three-dimensional structure. The CO and O2 adsorption properties depend strongly on particle structure and size. All of the reaction barriers relevant to CO oxidation display linear scaling relationships with CO and O2 binding strengths as main reactivity descriptors. Planar and three-dimensional gold clusters exhibit different linear scaling relationship due to different surface topologies and different coordination numbers of the surface atoms. On the basis of these linear scaling relationships, first-principles microkinetics simulations were conducted to determine CO oxidation rates and possible rate-determining step of Au particles. Planar Au9 and three-dimensional Au79 clusters present the highest CO oxidation rates for planar and three-dimensional clusters, respectively. The planar Au9 cluster is much more active than the optimum Au79 cluster. A common feature of optimum CO oxidation performance is the intermediate binding strengths of CO and O2, resulting in intermediate coverages of CO, O2, and O. Both these optimum particles present lower performance than maximum Sabatier performance, indicating that there is sufficient room for improvement of gold catalysts for CO oxidation. PMID:29707098
An Autonomous Sensor Tasking Approach for Large Scale Space Object Cataloging
NASA Astrophysics Data System (ADS)
Linares, R.; Furfaro, R.
The field of Space Situational Awareness (SSA) has progressed over the last few decades with new sensors coming online, the development of new approaches for making observations, and new algorithms for processing them. Although there has been success in the development of new approaches, a missing piece is the translation of SSA goals to sensors and resource allocation; otherwise known as the Sensor Management Problem (SMP). This work solves the SMP using an artificial intelligence approach called Deep Reinforcement Learning (DRL). Stable methods for training DRL approaches based on neural networks exist, but most of these approaches are not suitable for high dimensional systems. The Asynchronous Advantage Actor-Critic (A3C) method is a recently developed and effective approach for high dimensional systems, and this work leverages these results and applies this approach to decision making in SSA. The decision space for the SSA problems can be high dimensional, even for tasking of a single telescope. Since the number of SOs in space is relatively high, each sensor will have a large number of possible actions at a given time. Therefore, efficient DRL approaches are required when solving the SMP for SSA. This work develops a A3C based method for DRL applied to SSA sensor tasking. One of the key benefits of DRL approaches is the ability to handle high dimensional data. For example DRL methods have been applied to image processing for the autonomous car application. For example, a 256x256 RGB image has 196608 parameters (256*256*3=196608) which is very high dimensional, and deep learning approaches routinely take images like this as inputs. Therefore, when applied to the whole catalog the DRL approach offers the ability to solve this high dimensional problem. This work has the potential to, for the first time, solve the non-myopic sensor tasking problem for the whole SO catalog (over 22,000 objects) providing a truly revolutionary result.
Auditing Management Practices in Schools: Recurring Communication Problems and Solutions
ERIC Educational Resources Information Center
Zwijze-Koning, Karen H.; de Jong, Menno D. T.
2009-01-01
Purpose: Over the past ten years, most Dutch high schools have been confronted with mergers, curriculum reforms, and managerial changes. As a result, the pressure on the schools' communication systems has increased and several problems have emerged. This paper aims to examine recurring clusters of communication problems in high schools.…
Generalizing MOND to explain the missing mass in galaxy clusters
NASA Astrophysics Data System (ADS)
Hodson, Alistair O.; Zhao, Hongsheng
2017-02-01
Context. MOdified Newtonian Dynamics (MOND) is a gravitational framework designed to explain the astronomical observations in the Universe without the inclusion of particle dark matter. MOND, in its current form, cannot explain the missing mass in galaxy clusters without the inclusion of some extra mass, be it in the form of neutrinos or non-luminous baryonic matter. We investigate whether the MOND framework can be generalized to account for the missing mass in galaxy clusters by boosting gravity in high gravitational potential regions. We examine and review Extended MOND (EMOND), which was designed to increase the MOND scale acceleration in high potential regions, thereby boosting the gravity in clusters. Aims: We seek to investigate galaxy cluster mass profiles in the context of MOND with the primary aim at explaining the missing mass problem fully without the need for dark matter. Methods: Using the assumption that the clusters are in hydrostatic equilibrium, we can compute the dynamical mass of each cluster and compare the result to the predicted mass of the EMOND formalism. Results: We find that EMOND has some success in fitting some clusters, but overall has issues when trying to explain the mass deficit fully. We also investigate an empirical relation to solve the cluster problem, which is found by analysing the cluster data and is based on the MOND paradigm. We discuss the limitations in the text.
Self consistency grouping: a stringent clustering method
2012-01-01
Background Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. Methods Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. Results Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. Conclusions SCG has potential for finding biological relationships under stringent conditions. PMID:23320864
A density functional global optimisation study of neutral 8-atom Cu-Ag and Cu-Au clusters
NASA Astrophysics Data System (ADS)
Heard, Christopher J.; Johnston, Roy L.
2013-02-01
The effect of doping on the energetics and dimensionality of eight atom coinage metal subnanometre particles is fully resolved using a genetic algorithm in tandem with on the fly density functional theory calculations to determine the global minima (GM) for Cu n Ag(8- n) and Cu n Au(8- n) clusters. Comparisons are made to previous ab initio work on mono- and bimetallic clusters, with excellent agreement found. Charge transfer and geometric arguments are considered to rationalise the stability of the particular permutational isomers found. An interesting transition between three dimensional and two dimensional GM structures is observed for copper-gold clusters, which is sharper and appears earlier in the doping series than is known for gold-silver particles.
Clustering and assembly dynamics of a one-dimensional microphase former.
Hu, Yi; Charbonneau, Patrick
2018-05-23
Both ordered and disordered microphases ubiquitously form in suspensions of particles that interact through competing short-range attraction and long-range repulsion (SALR). While ordered microphases are more appealing materials targets, understanding the rich structural and dynamical properties of their disordered counterparts is essential to controlling their mesoscale assembly. Here, we study the disordered regime of a one-dimensional (1D) SALR model, whose simplicity enables detailed analysis by transfer matrices and Monte Carlo simulations. We first characterize the signature of the clustering process on macroscopic observables, and then assess the equilibration dynamics of various simulation algorithms. We notably find that cluster moves markedly accelerate the mixing time, but that event chains are of limited help in the clustering regime. These insights will inspire further study of three-dimensional microphase formers.
Cosmic-Ray Feedback Heating of the Intracluster Medium
NASA Astrophysics Data System (ADS)
Ruszkowski, Mateusz; Yang, H.-Y. Karen; Reynolds, Christopher S.
2017-07-01
Active galactic nuclei (AGNs) play a central role in solving the decades-old cooling-flow problem. Although there is consensus that AGNs provide the energy to prevent catastrophically large star formation, one major problem remains: How is the AGN energy thermalized in the intracluster medium (ICM)? We perform a suite of three-dimensional magnetohydrodynamical adaptive mesh refinement simulations of AGN feedback in a cool core cluster including cosmic rays (CRs). CRs are supplied to the ICM via collimated AGN jets and subsequently disperse in the magnetized ICM via streaming, and interact with the ICM via hadronic, Coulomb, and streaming instability heating. We find that CR transport is an essential model ingredient at least within the context of the physical model considered here. When streaming is included, (I) CRs come into contact with the ambient ICM and efficiently heat it, (II) streaming instability heating dominates over Coulomb and hadronic heating, (III) the AGN is variable and the atmosphere goes through low-/high-velocity dispersion cycles, and, importantly, (IV) CR pressure support in the cool core is very low and does not demonstrably violate observational constraints. However, when streaming is ignored, CR energy is not efficiently spent on the ICM heating and CR pressure builds up to a significant level, creating tension with the observations. Overall, we demonstrate that CR heating is a viable channel for the AGN energy thermalization in clusters and likely also in ellipticals, and that CRs play an important role in determining AGN intermittency and the dynamical state of cool cores.
Topic modeling for cluster analysis of large biological and medical datasets
2014-01-01
Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.
Zhao, Weizhong; Zou, Wen; Chen, James J
2014-01-01
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
Unsteady three-dimensional thermal field prediction in turbine blades using nonlinear BEM
NASA Technical Reports Server (NTRS)
Martin, Thomas J.; Dulikravich, George S.
1993-01-01
A time-and-space accurate and computationally efficient fully three dimensional unsteady temperature field analysis computer code has been developed for truly arbitrary configurations. It uses boundary element method (BEM) formulation based on an unsteady Green's function approach, multi-point Gaussian quadrature spatial integration on each panel, and a highly clustered time-step integration. The code accepts either temperatures or heat fluxes as boundary conditions that can vary in time on a point-by-point basis. Comparisons of the BEM numerical results and known analytical unsteady results for simple shapes demonstrate very high accuracy and reliability of the algorithm. An example of computed three dimensional temperature and heat flux fields in a realistically shaped internally cooled turbine blade is also discussed.
M-Isomap: Orthogonal Constrained Marginal Isomap for Nonlinear Dimensionality Reduction.
Zhang, Zhao; Chow, Tommy W S; Zhao, Mingbo
2013-02-01
Isomap is a well-known nonlinear dimensionality reduction (DR) method, aiming at preserving geodesic distances of all similarity pairs for delivering highly nonlinear manifolds. Isomap is efficient in visualizing synthetic data sets, but it usually delivers unsatisfactory results in benchmark cases. This paper incorporates the pairwise constraints into Isomap and proposes a marginal Isomap (M-Isomap) for manifold learning. The pairwise Cannot-Link and Must-Link constraints are used to specify the types of neighborhoods. M-Isomap computes the shortest path distances over constrained neighborhood graphs and guides the nonlinear DR through separating the interclass neighbors. As a result, large margins between both interand intraclass clusters are delivered and enhanced compactness of intracluster points is achieved at the same time. The validity of M-Isomap is examined by extensive simulations over synthetic, University of California, Irvine, and benchmark real Olivetti Research Library, YALE, and CMU Pose, Illumination, and Expression databases. The data visualization and clustering power of M-Isomap are compared with those of six related DR methods. The visualization results show that M-Isomap is able to deliver more separate clusters. Clustering evaluations also demonstrate that M-Isomap delivers comparable or even better results than some state-of-the-art DR algorithms.
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Checking the possibility of controlling fuel element by X-ray computerized tomography
NASA Astrophysics Data System (ADS)
Trinh, V. B.; Zhong, Y.; Osipov, S. P.; Batranin, A. V.
2017-08-01
The article considers the possibility of checking fuel elements by X-ray computerized tomography. The checking tasks are based on the detection of particles of active material, evaluation of the heterogeneity of the distribution of uranium salts and the detection of clusters of uranium particles. First of all, scheme of scanning improve the performance and quality of the resulting three-dimensional images of the internal structure is determined. Further, the possibility of detecting clusters of uranium particles having the size of 1 mm3 and measuring the coordinates of clusters of uranium particles in the middle layer with the accuracy of within a voxel size (for the considered experiments of about 80 μm) is experimentally proved in the main part. The problem of estimating the heterogeneity of the distribution of the active material in the middle layer and the detection of particles of active material with a nominal diameter of 0.1 mm in the “blank” is solved.
Spot detection and image segmentation in DNA microarray data.
Qin, Li; Rueda, Luis; Ali, Adnan; Ngom, Alioune
2005-01-01
Following the invention of microarrays in 1994, the development and applications of this technology have grown exponentially. The numerous applications of microarray technology include clinical diagnosis and treatment, drug design and discovery, tumour detection, and environmental health research. One of the key issues in the experimental approaches utilising microarrays is to extract quantitative information from the spots, which represent genes in a given experiment. For this process, the initial stages are important and they influence future steps in the analysis. Identifying the spots and separating the background from the foreground is a fundamental problem in DNA microarray data analysis. In this review, we present an overview of state-of-the-art methods for microarray image segmentation. We discuss the foundations of the circle-shaped approach, adaptive shape segmentation, histogram-based methods and the recently introduced clustering-based techniques. We analytically show that clustering-based techniques are equivalent to the one-dimensional, standard k-means clustering algorithm that utilises the Euclidean distance.
Synchronous parallel spatially resolved stochastic cluster dynamics
Dunn, Aaron; Dingreville, Rémi; Martínez, Enrique; ...
2016-04-23
In this work, a spatially resolved stochastic cluster dynamics (SRSCD) model for radiation damage accumulation in metals is implemented using a synchronous parallel kinetic Monte Carlo algorithm. The parallel algorithm is shown to significantly increase the size of representative volumes achievable in SRSCD simulations of radiation damage accumulation. Additionally, weak scaling performance of the method is tested in two cases: (1) an idealized case of Frenkel pair diffusion and annihilation, and (2) a characteristic example problem including defect cluster formation and growth in α-Fe. For the latter case, weak scaling is tested using both Frenkel pair and displacement cascade damage.more » To improve scaling of simulations with cascade damage, an explicit cascade implantation scheme is developed for cases in which fast-moving defects are created in displacement cascades. For the first time, simulation of radiation damage accumulation in nanopolycrystals can be achieved with a three dimensional rendition of the microstructure, allowing demonstration of the effect of grain size on defect accumulation in Frenkel pair-irradiated α-Fe.« less
Analysis of a municipal wastewater treatment plant using a neural network-based pattern analysis
Hong, Y.-S.T.; Rosen, Michael R.; Bhamidimarri, R.
2003-01-01
This paper addresses the problem of how to capture the complex relationships that exist between process variables and to diagnose the dynamic behaviour of a municipal wastewater treatment plant (WTP). Due to the complex biological reaction mechanisms, the highly time-varying, and multivariable aspects of the real WTP, the diagnosis of the WTP are still difficult in practice. The application of intelligent techniques, which can analyse the multi-dimensional process data using a sophisticated visualisation technique, can be useful for analysing and diagnosing the activated-sludge WTP. In this paper, the Kohonen Self-Organising Feature Maps (KSOFM) neural network is applied to analyse the multi-dimensional process data, and to diagnose the inter-relationship of the process variables in a real activated-sludge WTP. By using component planes, some detailed local relationships between the process variables, e.g., responses of the process variables under different operating conditions, as well as the global information is discovered. The operating condition and the inter-relationship among the process variables in the WTP have been diagnosed and extracted by the information obtained from the clustering analysis of the maps. It is concluded that the KSOFM technique provides an effective analysing and diagnosing tool to understand the system behaviour and to extract knowledge contained in multi-dimensional data of a large-scale WTP. ?? 2003 Elsevier Science Ltd. All rights reserved.
SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data
NASA Astrophysics Data System (ADS)
Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.
2015-12-01
Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.
Nonlinear dimensionality reduction of data lying on the multicluster manifold.
Meng, Deyu; Leung, Yee; Fung, Tung; Xu, Zongben
2008-08-01
A new method, which is called decomposition-composition (D-C) method, is proposed for the nonlinear dimensionality reduction (NLDR) of data lying on the multicluster manifold. The main idea is first to decompose a given data set into clusters and independently calculate the low-dimensional embeddings of each cluster by the decomposition procedure. Based on the intercluster connections, the embeddings of all clusters are then composed into their proper positions and orientations by the composition procedure. Different from other NLDR methods for multicluster data, which consider associatively the intracluster and intercluster information, the D-C method capitalizes on the separate employment of the intracluster neighborhood structures and the intercluster topologies for effective dimensionality reduction. This, on one hand, isometrically preserves the rigid-body shapes of the clusters in the embedding process and, on the other hand, guarantees the proper locations and orientations of all clusters. The theoretical arguments are supported by a series of experiments performed on the synthetic and real-life data sets. In addition, the computational complexity of the proposed method is analyzed, and its efficiency is theoretically analyzed and experimentally demonstrated. Related strategies for automatic parameter selection are also examined.
Impact of network topology on self-organized criticality
NASA Astrophysics Data System (ADS)
Hoffmann, Heiko
2018-02-01
The general mechanisms behind self-organized criticality (SOC) are still unknown. Several microscopic and mean-field theory approaches have been suggested, but they do not explain the dependence of the exponents on the underlying network topology of the SOC system. Here, we first report the phenomena that in the Bak-Tang-Wiesenfeld (BTW) model, sites inside an avalanche area largely return to their original state after the passing of an avalanche, forming, effectively, critically arranged clusters of sites. Then, we hypothesize that SOC relies on the formation process of these clusters, and present a model of such formation. For low-dimensional networks, we show theoretically and in simulation that the exponent of the cluster-size distribution is proportional to the ratio of the fractal dimension of the cluster boundary and the dimensionality of the network. For the BTW model, in our simulations, the exponent of the avalanche-area distribution matched approximately our prediction based on this ratio for two-dimensional networks, but deviated for higher dimensions. We hypothesize a transition from cluster formation to the mean-field theory process with increasing dimensionality. This work sheds light onto the mechanisms behind SOC, particularly, the impact of the network topology.
Rigidity of transmembrane proteins determines their cluster shape
NASA Astrophysics Data System (ADS)
Jafarinia, Hamidreza; Khoshnood, Atefeh; Jalali, Mir Abbas
2016-01-01
Protein aggregation in cell membrane is vital for the majority of biological functions. Recent experimental results suggest that transmembrane domains of proteins such as α -helices and β -sheets have different structural rigidities. We use molecular dynamics simulation of a coarse-grained model of protein-embedded lipid membranes to investigate the mechanisms of protein clustering. For a variety of protein concentrations, our simulations under thermal equilibrium conditions reveal that the structural rigidity of transmembrane domains dramatically affects interactions and changes the shape of the cluster. We have observed stable large aggregates even in the absence of hydrophobic mismatch, which has been previously proposed as the mechanism of protein aggregation. According to our results, semiflexible proteins aggregate to form two-dimensional clusters, while rigid proteins, by contrast, form one-dimensional string-like structures. By assuming two probable scenarios for the formation of a two-dimensional triangular structure, we calculate the lipid density around protein clusters and find that the difference in lipid distribution around rigid and semiflexible proteins determines the one- or two-dimensional nature of aggregates. It is found that lipids move faster around semiflexible proteins than rigid ones. The aggregation mechanism suggested in this paper can be tested by current state-of-the-art experimental facilities.
Chemistry and Processing of Nanostructured Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fox, G A; Baumann, T F; Hope-Weeks, L J
2002-01-18
Nanostructured materials can be formed through the sol-gel polymerization of inorganic or organic monomer systems. For example, a two step polymerization of tetramethoxysilane (TMOS) was developed such that silica aerogels with densities as low as 3 kg/m{sup 3} ({approx} two times the density of air) could be achieved. Organic aerogels based upon resorcinol-formaldehyde and melamine-formaldehyde can also be prepared using the sol-gel process. Materials of this type have received significant attention at LLNL due to their ultrafine cell sizes, continuous porosity, high surface area and low mass density. For both types of aerogels, sol-gel polymerization depends upon the transformation ofmore » these monomers into nanometer-sized clusters followed by cross-linking into a 3-dimensional gel network. While sol-gel chemistry provides the opportunity to synthesize new material compositions, it suffers from the inability to separate the process of cluster formation from gelation. This limitation results in structural deficiencies in the gel that impact the physical properties of the aerogel, xerogel or nanocomposite. In order to control the properties of the resultant gel, one should be able to regulate the formation of the clusters and their subsequent cross-linking. Towards this goal, we are utilizing dendrimer chemistry to separate the cluster formation from the gelation so that new nanostructured materials can be produced. Dendrimers are three-dimensional, highly branched macromolecules that are prepared in such a way that their size, shape and surface functionality are readily controlled. The dendrimers will be used as pre-formed clusters of known size that can be cross-linked to form an ordered gel network.« less
Synaptic Bistability Due to Nucleation and Evaporation of Receptor Clusters
NASA Astrophysics Data System (ADS)
Burlakov, V. M.; Emptage, N.; Goriely, A.; Bressloff, P. C.
2012-01-01
We introduce a bistability mechanism for long-term synaptic plasticity based on switching between two metastable states that contain significantly different numbers of synaptic receptors. One state is characterized by a two-dimensional gas of mobile interacting receptors and is stabilized against clustering by a high nucleation barrier. The other state contains a receptor gas in equilibrium with a large cluster of immobile receptors, which is stabilized by the turnover rate of receptors into and out of the synapse. Transitions between the two states can be initiated by either an increase (potentiation) or a decrease (depotentiation) of the net receptor flux into the synapse. This changes the saturation level of the receptor gas and triggers nucleation or evaporation of receptor clusters.
Swarm Intelligence in Text Document Clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less
Nonlinear Conservation Laws and Finite Volume Methods
NASA Astrophysics Data System (ADS)
Leveque, Randall J.
Introduction Software Notation Classification of Differential Equations Derivation of Conservation Laws The Euler Equations of Gas Dynamics Dissipative Fluxes Source Terms Radiative Transfer and Isothermal Equations Multi-dimensional Conservation Laws The Shock Tube Problem Mathematical Theory of Hyperbolic Systems Scalar Equations Linear Hyperbolic Systems Nonlinear Systems The Riemann Problem for the Euler Equations Numerical Methods in One Dimension Finite Difference Theory Finite Volume Methods Importance of Conservation Form - Incorrect Shock Speeds Numerical Flux Functions Godunov's Method Approximate Riemann Solvers High-Resolution Methods Other Approaches Boundary Conditions Source Terms and Fractional Steps Unsplit Methods Fractional Step Methods General Formulation of Fractional Step Methods Stiff Source Terms Quasi-stationary Flow and Gravity Multi-dimensional Problems Dimensional Splitting Multi-dimensional Finite Volume Methods Grids and Adaptive Refinement Computational Difficulties Low-Density Flows Discrete Shocks and Viscous Profiles Start-Up Errors Wall Heating Slow-Moving Shocks Grid Orientation Effects Grid-Aligned Shocks Magnetohydrodynamics The MHD Equations One-Dimensional MHD Solving the Riemann Problem Nonstrict Hyperbolicity Stiffness The Divergence of B Riemann Problems in Multi-dimensional MHD Staggered Grids The 8-Wave Riemann Solver Relativistic Hydrodynamics Conservation Laws in Spacetime The Continuity Equation The 4-Momentum of a Particle The Stress-Energy Tensor Finite Volume Methods Multi-dimensional Relativistic Flow Gravitation and General Relativity References
Computing and visualizing time-varying merge trees for high-dimensional data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2017-06-03
We introduce a new method that identifies and tracks features in arbitrary dimensions using the merge tree -- a structure for identifying topological features based on thresholding in scalar fields. This method analyzes the evolution of features of the function by tracking changes in the merge tree and relates features by matching subtrees between consecutive time steps. Using the time-varying merge tree, we present a structural visualization of the changing function that illustrates both features and their temporal evolution. We demonstrate the utility of our approach by applying it to temporal cluster analysis of high-dimensional point clouds.
NASA Astrophysics Data System (ADS)
Chiu, I.-Non; Umetsu, Keiichi; Sereno, Mauro; Ettori, Stefano; Meneghetti, Massimo; Merten, Julian; Sayers, Jack; Zitrin, Adi
2018-06-01
We perform a three-dimensional triaxial analysis of 16 X-ray regular and 4 high-magnification galaxy clusters selected from the CLASH survey by combining two-dimensional weak-lensing and central strong-lensing constraints. In a Bayesian framework, we constrain the intrinsic structure and geometry of each individual cluster assuming a triaxial Navarro–Frenk–White halo with arbitrary orientations, characterized by the mass {M}200{{c}}, halo concentration {c}200{{c}}, and triaxial axis ratios ({q}{{a}}≤slant {q}{{b}}), and investigate scaling relations between these halo structural parameters. From triaxial modeling of the X-ray-selected subsample, we find that the halo concentration decreases with increasing cluster mass, with a mean concentration of {c}200{{c}}=4.82+/- 0.30 at the pivot mass {M}200{{c}}={10}15{M}ȯ {h}-1. This is consistent with the result from spherical modeling, {c}200{{c}}=4.51+/- 0.14. Independently of the priors, the minor-to-major axis ratio {q}{{a}} of our full sample exhibits a clear deviation from the spherical configuration ({q}{{a}}=0.52+/- 0.04 at {10}15{M}ȯ {h}-1 with uniform priors), with a weak dependence on the cluster mass. Combining all 20 clusters, we obtain a joint ensemble constraint on the minor-to-major axis ratio of {q}{{a}}={0.652}-0.078+0.162 and a lower bound on the intermediate-to-major axis ratio of {q}{{b}}> 0.63 at the 2σ level from an analysis with uniform priors. Assuming priors on the axis ratios derived from numerical simulations, we constrain the degree of triaxiality for the full sample to be { \\mathcal T }=0.79+/- 0.03 at {10}15{M}ȯ {h}-1, indicating a preference for a prolate geometry of cluster halos. We find no statistical evidence for an orientation bias ({f}geo}=0.93+/- 0.07), which is insensitive to the priors and in agreement with the theoretical expectation for the CLASH clusters.
Subspace Dimensionality: A Tool for Automated QC in Seismic Array Processing
NASA Astrophysics Data System (ADS)
Rowe, C. A.; Stead, R. J.; Begnaud, M. L.
2013-12-01
Because of the great resolving power of seismic arrays, the application of automated processing to array data is critically important in treaty verification work. A significant problem in array analysis is the inclusion of bad sensor channels in the beamforming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by node basis, so the dimensionality of the node traffic is instead monitoried for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. In the established template application, a detector functions in a manner analogous to waveform cross-correlation, applying a statistical test to assess the similarity of the incoming data stream to known templates for events of interest. In our approach, we seek not to detect matching signals, but instead, we examine the signal subspace dimensionality in much the same way that the method addresses node traffic anomalies in large computer systems. Signal anomalies recorded on seismic arrays affect the dimensional structure of the array-wide time-series. We have shown previously that this observation is useful in identifying real seismic events, either by looking at the raw signal or derivatives thereof (entropy, kurtosis), but here we explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements through a jackknifing process to isolate the anomalous channels, so that an automated analysis system might discard them prior to FK analysis and beamforming on events of interest.
Spectral-clustering approach to Lagrangian vortex detection.
Hadjighasem, Alireza; Karrasch, Daniel; Teramoto, Hiroshi; Haller, George
2016-06-01
One of the ubiquitous features of real-life turbulent flows is the existence and persistence of coherent vortices. Here we show that such coherent vortices can be extracted as clusters of Lagrangian trajectories. We carry out the clustering on a weighted graph, with the weights measuring pairwise distances of fluid trajectories in the extended phase space of positions and time. We then extract coherent vortices from the graph using tools from spectral graph theory. Our method locates all coherent vortices in the flow simultaneously, thereby showing high potential for automated vortex tracking. We illustrate the performance of this technique by identifying coherent Lagrangian vortices in several two- and three-dimensional flows.
Clustering method for counting passengers getting in a bus with single camera
NASA Astrophysics Data System (ADS)
Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying
2010-03-01
Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.
Supporting Dynamic Quantization for High-Dimensional Data Analytics.
Guzun, Gheorghi; Canahuate, Guadalupe
2017-05-01
Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.
Chen, X C; Liu, H; Li, H; Cheng, Y; Yang, L; Liu, Y F
2016-06-27
In this study, a dynamic three-dimensional cell culture technology was used to expand and differentiate rat pancreatic duct-derived stem cells (PDSCs) into islet-like cell clusters that can secrete insulin. PDSCs were isolated from rat pancreatic tissues by in situ collagenase digestion and density gradient centrifugation. Using a dynamic three-dimensional culture technique, the cells were expanded and differentiated into functional islet-like cell clusters, which were characterized by morphological and phenotype analyses. After maintaining 1 x 108 isolated rat PDSCs in a dynamic three-dimensional cell culture for 7 days, 1.5 x 109 cells could be harvested. Passaged PDSCs expressed markers of pancreatic endocrine progenitors, including CD29 (86.17%), CD73 (90.73%), CD90 (84.13%), CD105 (78.28%), and Pdx-1. Following 14 additional days of culture in serum-free medium with nicotinamide, keratinocyte growth factor (KGF), and b fibroblast growth factor (FGF), the cells were differentiated into islet-like cell clusters (ICCs). The ICC morphology reflected that of fused cell clusters. During the late stage of differentiation, representative clusters were non-adherent and expressed insulin indicated by dithizone (DTZ)-positive staining. Insulin was detected in the extracellular fluid and cytoplasm of ICCs after 14 days of differentiation. Additionally, insulin levels were significantly higher at this time compared with the levels exhibited by PDSCs before differentiation (P < 0.01). By using a dynamic three-dimensional cell culture system, PDSCs can be expanded in vitro and can differentiate into functional islet-like cell clusters.
Semisupervised kernel marginal Fisher analysis for face recognition.
Wang, Ziqiang; Sun, Xia; Sun, Lijun; Huang, Yuchun
2013-01-01
Dimensionality reduction is a key problem in face recognition due to the high-dimensionality of face image. To effectively cope with this problem, a novel dimensionality reduction algorithm called semisupervised kernel marginal Fisher analysis (SKMFA) for face recognition is proposed in this paper. SKMFA can make use of both labelled and unlabeled samples to learn the projection matrix for nonlinear dimensionality reduction. Meanwhile, it can successfully avoid the singularity problem by not calculating the matrix inverse. In addition, in order to make the nonlinear structure captured by the data-dependent kernel consistent with the intrinsic manifold structure, a manifold adaptive nonparameter kernel is incorporated into the learning process of SKMFA. Experimental results on three face image databases demonstrate the effectiveness of our proposed algorithm.
OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing
NASA Astrophysics Data System (ADS)
Wei, Shoulin; Wang, Feng; Deng, Hui; Liu, Cuiyin; Dai, Wei; Liang, Bo; Mei, Ying; Shi, Congming; Liu, Yingbo; Wu, Jingping
2017-02-01
The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.
Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data
Király, András; Abonyi, János
2014-01-01
During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers. PMID:24616651
Hierarchical Discriminant Analysis.
Lu, Di; Ding, Chuntao; Xu, Jinliang; Wang, Shangguang
2018-01-18
The Internet of Things (IoT) generates lots of high-dimensional sensor intelligent data. The processing of high-dimensional data (e.g., data visualization and data classification) is very difficult, so it requires excellent subspace learning algorithms to learn a latent subspace to preserve the intrinsic structure of the high-dimensional data, and abandon the least useful information in the subsequent processing. In this context, many subspace learning algorithms have been presented. However, in the process of transforming the high-dimensional data into the low-dimensional space, the huge difference between the sum of inter-class distance and the sum of intra-class distance for distinct data may cause a bias problem. That means that the impact of intra-class distance is overwhelmed. To address this problem, we propose a novel algorithm called Hierarchical Discriminant Analysis (HDA). It minimizes the sum of intra-class distance first, and then maximizes the sum of inter-class distance. This proposed method balances the bias from the inter-class and that from the intra-class to achieve better performance. Extensive experiments are conducted on several benchmark face datasets. The results reveal that HDA obtains better performance than other dimensionality reduction algorithms.
Feature extraction and classification algorithms for high dimensional data
NASA Technical Reports Server (NTRS)
Lee, Chulhee; Landgrebe, David
1993-01-01
Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized. By investigating the characteristics of high dimensional data, the reason why the second order statistics must be taken into account in high dimensional data is suggested. Recognizing the importance of the second order statistics, there is a need to represent the second order statistics. A method to visualize statistics using a color code is proposed. By representing statistics using color coding, one can easily extract and compare the first and the second statistics.
Tian, Ye; Wang, Tong; Liu, Wenyan; ...
2015-05-25
Three-dimensional mesoscale clusters that are formed from nanoparticles spatially arranged in pre-determined positions can be thought of as mesoscale analogues of molecules. These nanoparticle architectures could offer tailored properties due to collective effects, but developing a general platform for fabricating such clusters is a significant challenge. Here, we report a strategy for assembling 3D nanoparticle clusters that uses a molecular frame designed with encoded vertices for particle placement. The frame is a DNA origami octahedron and can be used to fabricate clusters with various symmetries and particle compositions. Cryo-electron microscopy is used to uncover the structure of the DNA framemore » and to reveal that the nanoparticles are spatially coordinated in the prescribed manner. We show that the DNA frame and one set of nanoparticles can be used to create nanoclusters with different chiroptical activities. We also show that the octahedra can serve as programmable interparticle linkers, allowing one- and two-dimensional arrays to be assembled that have designed particle arrangements.« less
Nonconventional screening of the Coulomb interaction in FexOy clusters: An ab initio study
NASA Astrophysics Data System (ADS)
Peters, L.; Şaşıoǧlu, E.; Rossen, S.; Friedrich, C.; Blügel, S.; Katsnelson, M. I.
2017-04-01
From microscopic point-dipole model calculations of the screening of the Coulomb interaction in nonpolar systems by polarizable atoms, it is known that screening strongly depends on dimensionality. For example, in one-dimensional systems, the short-range interaction is screened, while the long-range interaction is antiscreened. This antiscreening is also observed in some zero-dimensional structures, i.e., molecular systems. By means of ab initio calculations in conjunction with the random-phase approximation (RPA) within the FLAPW method, we study screening of the Coulomb interaction in FexOy clusters. For completeness, these results are compared with their bulk counterpart magnetite. It appears that the on-site Coulomb interaction is very well screened both in the clusters and bulk. On the other hand, for the intersite Coulomb interaction, the important observation is made that it is almost constant throughout the clusters, while for the bulk it is almost completely screened. More precisely and interestingly, in the clusters antiscreening is observed by means of ab initio calculations.
Hypergraph-based anomaly detection of high-dimensional co-occurrences.
Silva, Jorge; Willett, Rebecca
2009-03-01
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.
The Three-Dimensional Power Spectrum Of Galaxies from the Sloan Digital Sky Survey
2004-05-10
aspects of the three-dimensional clustering of a much larger data set involving over 200,000 galaxies with redshifts. This paper is focused on measuring... papers , we will constrain galaxy bias empirically by using clustering measurements on smaller scales (e.g., I. Zehavi et al. 2004, in preparation...minimum-variance measurements in 22 k-bands of both the clustering power and its anisotropy due to redshift-space distortions, with narrow and well
Structures of undecagold clusters: Ligand effect
NASA Astrophysics Data System (ADS)
Spivey, Kasi; Williams, Joseph I.; Wang, Lichang
2006-12-01
The most stable structure of undecagold, or Au 11, clusters was predicted from our DFT calculations to be planar [L. Xiao, L. Wang, Chem. Phys. Lett. 392 (2004) 452; L. Xiao, B. Tollberg, X. Hu, L. Wang, J. Chem. Phys. 124 (2005) 114309.]. The structures of ligand protected undecagold clusters were shown to be three-dimensional experimentally. In this work, we used DFT calculations to study the ligand effect on the structures of Au 11 clusters. Our results show that the most stable structure of Au 11 is in fact three-dimensional when SCH 3 ligands are attached. This indicates that the structures of small gold clusters are altered substantially in the presence of ligands.
Parallelized Bayesian inversion for three-dimensional dental X-ray imaging.
Kolehmainen, Ville; Vanne, Antti; Siltanen, Samuli; Järvenpää, Seppo; Kaipio, Jari P; Lassas, Matti; Kalke, Martti
2006-02-01
Diagnostic and operational tasks based on dental radiology often require three-dimensional (3-D) information that is not available in a single X-ray projection image. Comprehensive 3-D information about tissues can be obtained by computerized tomography (CT) imaging. However, in dental imaging a conventional CT scan may not be available or practical because of high radiation dose, low-resolution or the cost of the CT scanner equipment. In this paper, we consider a novel type of 3-D imaging modality for dental radiology. We consider situations in which projection images of the teeth are taken from a few sparsely distributed projection directions using the dentist's regular (digital) X-ray equipment and the 3-D X-ray attenuation function is reconstructed. A complication in these experiments is that the reconstruction of the 3-D structure based on a few projection images becomes an ill-posed inverse problem. Bayesian inversion is a well suited framework for reconstruction from such incomplete data. In Bayesian inversion, the ill-posed reconstruction problem is formulated in a well-posed probabilistic form in which a priori information is used to compensate for the incomplete information of the projection data. In this paper we propose a Bayesian method for 3-D reconstruction in dental radiology. The method is partially based on Kolehmainen et al. 2003. The prior model for dental structures consist of a weighted l1 and total variation (TV)-prior together with the positivity prior. The inverse problem is stated as finding the maximum a posteriori (MAP) estimate. To make the 3-D reconstruction computationally feasible, a parallelized version of an optimization algorithm is implemented for a Beowulf cluster computer. The method is tested with projection data from dental specimens and patient data. Tomosynthetic reconstructions are given as reference for the proposed method.
Relevance feedback-based building recognition
NASA Astrophysics Data System (ADS)
Li, Jing; Allinson, Nigel M.
2010-07-01
Building recognition is a nontrivial task in computer vision research which can be utilized in robot localization, mobile navigation, etc. However, existing building recognition systems usually encounter the following two problems: 1) extracted low level features cannot reveal the true semantic concepts; and 2) they usually involve high dimensional data which require heavy computational costs and memory. Relevance feedback (RF), widely applied in multimedia information retrieval, is able to bridge the gap between the low level visual features and high level concepts; while dimensionality reduction methods can mitigate the high-dimensional problem. In this paper, we propose a building recognition scheme which integrates the RF and subspace learning algorithms. Experimental results undertaken on our own building database show that the newly proposed scheme appreciably enhances the recognition accuracy.
Manifold Learning by Preserving Distance Orders.
Ataer-Cansizoglu, Esra; Akcakaya, Murat; Orhan, Umut; Erdogmus, Deniz
2014-03-01
Nonlinear dimensionality reduction is essential for the analysis and the interpretation of high dimensional data sets. In this manuscript, we propose a distance order preserving manifold learning algorithm that extends the basic mean-squared error cost function used mainly in multidimensional scaling (MDS)-based methods. We develop a constrained optimization problem by assuming explicit constraints on the order of distances in the low-dimensional space. In this optimization problem, as a generalization of MDS, instead of forcing a linear relationship between the distances in the high-dimensional original and low-dimensional projection space, we learn a non-decreasing relation approximated by radial basis functions. We compare the proposed method with existing manifold learning algorithms using synthetic datasets based on the commonly used residual variance and proposed percentage of violated distance orders metrics. We also perform experiments on a retinal image dataset used in Retinopathy of Prematurity (ROP) diagnosis.
On the complexity of some quadratic Euclidean 2-clustering problems
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Pyatkin, A. V.
2016-03-01
Some problems of partitioning a finite set of points of Euclidean space into two clusters are considered. In these problems, the following criteria are minimized: (1) the sum over both clusters of the sums of squared pairwise distances between the elements of the cluster and (2) the sum of the (multiplied by the cardinalities of the clusters) sums of squared distances from the elements of the cluster to its geometric center, where the geometric center (or centroid) of a cluster is defined as the mean value of the elements in that cluster. Additionally, another problem close to (2) is considered, where the desired center of one of the clusters is given as input, while the center of the other cluster is unknown (is the variable to be optimized) as in problem (2). Two variants of the problems are analyzed, in which the cardinalities of the clusters are (1) parts of the input or (2) optimization variables. It is proved that all the considered problems are strongly NP-hard and that, in general, there is no fully polynomial-time approximation scheme for them (unless P = NP).
Visual exploration of high-dimensional data through subspace analysis and dynamic projections
Liu, S.; Wang, B.; Thiagarajan, J. J.; ...
2015-06-01
Here, we introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that createmore » smooth animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less
Visual Exploration of High-Dimensional Data through Subspace Analysis and Dynamic Projections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, S.; Wang, B.; Thiagarajan, Jayaraman J.
2015-06-01
We introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that create smoothmore » animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less
Coulomb double helical structure
NASA Astrophysics Data System (ADS)
Kamimura, Tetsuo; Ishihara, Osamu
2012-01-01
Structures of Coulomb clusters formed by dust particles in a plasma are studied by numerical simulation. Our study reveals the presence of various types of self-organized structures of a cluster confined in a prolate spheroidal electrostatic potential. The stable configurations depend on a prolateness parameter for the confining potential as well as on the number of dust particles in a cluster. One-dimensional string, two-dimensional zigzag structure and three-dimensional double helical structure are found as a result of the transition controlled by the prolateness parameter. The formation of stable double helical structures resulted from the transition associated with the instability of angular perturbations on double strings. Analytical perturbation study supports the findings of numerical simulations.
Scaling Properties of Dimensionality Reduction for Neural Populations and Network Models
Cowley, Benjamin R.; Doiron, Brent; Kohn, Adam
2016-01-01
Recent studies have applied dimensionality reduction methods to understand how the multi-dimensional structure of neural population activity gives rise to brain function. It is unclear, however, how the results obtained from dimensionality reduction generalize to recordings with larger numbers of neurons and trials or how these results relate to the underlying network structure. We address these questions by applying factor analysis to recordings in the visual cortex of non-human primates and to spiking network models that self-generate irregular activity through a balance of excitation and inhibition. We compared the scaling trends of two key outputs of dimensionality reduction—shared dimensionality and percent shared variance—with neuron and trial count. We found that the scaling properties of networks with non-clustered and clustered connectivity differed, and that the in vivo recordings were more consistent with the clustered network. Furthermore, recordings from tens of neurons were sufficient to identify the dominant modes of shared variability that generalize to larger portions of the network. These findings can help guide the interpretation of dimensionality reduction outputs in regimes of limited neuron and trial sampling and help relate these outputs to the underlying network structure. PMID:27926936
Planar CoB18- Cluster: a New Motif for - and Metallo-Borophenes
NASA Astrophysics Data System (ADS)
Chen, Teng-Teng; Jian, Tian; Lopez, Gary; Li, Wan-Lu; Chen, Xin; Li, Jun; Wang, Lai-Sheng
2016-06-01
Combined Photoelectron Spectroscopy (PES) and theoretical calculations have found that anion boron clusters (Bn-) are planar and quasi-planar up to B25-. Recent works show that anion pure boron clusters continued to be planar at B27-,B30-,B35- and B36-. B35- and B36- provide the first experimental evidence for the viability of the two-dimensional (2D) boron sheets (Borophene). The 2D to three-dimensional (3D) transitions are shown to happen at B40-,B39- and B28-, which possess cage-like structures. These fullerene-like boron cage clusters are named as Borospherene. Recently, borophenes or similar structures are claimed to be synthesized by several groups. Following an electronic design principle, a series of transition-metal-doped boron clusters (M©Bn-, n=8-10) are found to possess the monocyclic wheel structures. Meanwhile, CoB12- and RhB12- are revealed to adopt half-sandwich-type structures with the quasi-planar B12 moiety similar to the B12- cluster. Very lately, we show that the CoB16- cluster possesses a highly symmetric Cobalt-centered drum-like structure, with a new record of coordination number at 16. Here we report the CoB18- cluster to possess a unique planar structure, in which the Co atom is doped into the network of a planar boron cluster. PES reveals that the CoB18- cluster is a highly stable electronic system with the first adiabatic detachment energy (ADE) at 4.0 eV. Global minimum searches along with high-level quantum calculations show the global minimum for CoB18- is perfectly planar and closed shell (1A1) with C2v symmetry. The Co atom is bonded with 7 boron atoms in the closest coordination shell and the other 11 boron atoms in the outer coordination shell. The calculated vertical detachment energy (VDE) values match quite well with our experimental results. Chemical bonding analysis by the Adaptive Natural Density Partitioning (AdNDP) method shows the CoB18- cluster is π-aromatic with four 4-centered-2-electron (4c-2e) π bonds and one 19-centered-2-electron (19c-2e) π bond, 10 π electrons in total. This perfectly planar structure reveals the viability of creating a new class of hetero-borophenes and metallo-borophenes by doping metal atoms into the plane of monolayer boron atoms. This gives a new approach to design perspective hetero-borophenes and metallo-borophenes materials with tunable chemical, magnetic and optical properties.
NASA Astrophysics Data System (ADS)
Caplan, R. M.
2013-04-01
We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time and both second- and fourth-order differencing in space. The integrators are written to run on NVIDIA GPUs and are interfaced with MATLAB including built-in visualization and analysis tools. Restrictions: The main restriction for the GPU integrators is the amount of RAM on the GPU as the code is currently only designed for running on a single GPU. Unusual features: Ability to visualize real-time simulations through the interaction of MATLAB and the compiled GPU integrators. Additional comments: Setup guide and Installation guide provided. Program has a dedicated web site at www.nlsemagic.com. Running time: A three-dimensional run with a grid dimension of 87×87×203 for 3360 time steps (100 non-dimensional time units) takes about one and a half minutes on a GeForce GTX 580 GPU card.
Exact solution of a one-dimensional model of strained epitaxy on a periodically modulated substrate
NASA Astrophysics Data System (ADS)
Tokar, V. I.; Dreyssé, H.
2005-03-01
We consider a one-dimensional lattice gas model of strained epitaxy with the elastic strain accounted for through a finite number of cluster interactions comprising contiguous atomic chains. Interactions of this type arise in the models of strained epitaxy based on the Frenkel-Kontorova model. Furthermore, the deposited atoms interact with the substrate via an arbitrary periodic potential of period L . This model is solved exactly with the use of an appropriately adopted technique developed recently in the theory of protein folding. The advantage of the proposed approach over the standard transfer-matrix method is that it reduces the problem to finding the largest eigenvalue of a matrix of size L instead of 2L-1 , which is vital in the case of nanostructures where L may measure in hundreds of interatomic distances. Our major conclusion is that the substrate modulation always facilitates the size calibration of self-assembled nanoparticles in one- and two-dimensional systems.
Sparse subspace clustering for data with missing entries and high-rank matrix completion.
Fan, Jicong; Chow, Tommy W S
2017-09-01
Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Clustering Millions of Faces by Identity.
Otto, Charles; Wang, Dayong; Jain, Anil K
2018-02-01
Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.
Correlation Functions in Two-Dimensional Critical Systems with Conformal Symmetry
NASA Astrophysics Data System (ADS)
Flores, Steven Miguel
This thesis presents a study of certain conformal field theory (CFT) correlation functions that describe physical observables in conform ally invariant two-dimensional critical systems. These are typically continuum limits of critical lattice models in a domain within the complex plane and with a boundary. Certain clusters, called
Search Techniques for Self-Organizing Systems
1975-07-01
according to their associated function values. The classes need not have equal function value ranges (i.e., the . ................... "The Mucciardi- Gose ... Gose , "An Automatic Clustering Algorithm and Its !’ropertizs in High-Dimensional Spaces,’[ IFEE Trans. S s~tems, Man and Cybernetics, Vol. SMC-2
Schouteden, Koen; Lauwaet, Koen; Janssens, Ewald; Barcaro, Giovanni; Fortunelli, Alessandro; Van Haesendonck, Chris; Lievens, Peter
2014-02-21
Preformed Co clusters with an average diameter of 2.5 nm are produced in the gas phase and are deposited under controlled ultra-high vacuum conditions onto a thin insulating NaCl film on Au(111). Relying on a combined experimental and theoretical investigation, we demonstrate visualization of the three-dimensional atomic structure of the Co clusters by high-resolution scanning tunneling microscopy (STM) using a Cl functionalized STM tip that can be obtained on the NaCl surface. More generally, use of a functionalized STM tip may allow for systematic atomic structure determination with STM of nanoparticles that are deposited on metal surfaces.
Jeong, Ji-Wook; Chae, Seung-Hoon; Chae, Eun Young; Kim, Hak Hee; Choi, Young-Wook; Lee, Sooyeul
2016-01-01
We propose computer-aided detection (CADe) algorithm for microcalcification (MC) clusters in reconstructed digital breast tomosynthesis (DBT) images. The algorithm consists of prescreening, MC detection, clustering, and false-positive (FP) reduction steps. The DBT images containing the MC-like objects were enhanced by a multiscale Hessian-based three-dimensional (3D) objectness response function and a connected-component segmentation method was applied to extract the cluster seed objects as potential clustering centers of MCs. Secondly, a signal-to-noise ratio (SNR) enhanced image was also generated to detect the individual MC candidates and prescreen the MC-like objects. Each cluster seed candidate was prescreened by counting neighboring individual MC candidates nearby the cluster seed object according to several microcalcification clustering criteria. As a second step, we introduced bounding boxes for the accepted seed candidate, clustered all the overlapping cubes, and examined. After the FP reduction step, the average number of FPs per case was estimated to be 2.47 per DBT volume with a sensitivity of 83.3%.
A Localized Ensemble Kalman Smoother
NASA Technical Reports Server (NTRS)
Butala, Mark D.
2012-01-01
Numerous geophysical inverse problems prove difficult because the available measurements are indirectly related to the underlying unknown dynamic state and the physics governing the system may involve imperfect models or unobserved parameters. Data assimilation addresses these difficulties by combining the measurements and physical knowledge. The main challenge in such problems usually involves their high dimensionality and the standard statistical methods prove computationally intractable. This paper develops and addresses the theoretical convergence of a new high-dimensional Monte-Carlo approach called the localized ensemble Kalman smoother.
NASA Technical Reports Server (NTRS)
Makivic, Miloje S.
1996-01-01
This is the final technical report for the project entitled: "High-Performance Computing and Four-Dimensional Data Assimilation: The Impact on Future and Current Problems", funded at NPAC by the DAO at NASA/GSFC. First, the motivation for the project is given in the introductory section, followed by the executive summary of major accomplishments and the list of project-related publications. Detailed analysis and description of research results is given in subsequent chapters and in the Appendix.
Asymptotic analysis of the narrow escape problem in dendritic spine shaped domain: three dimensions
NASA Astrophysics Data System (ADS)
Li, Xiaofei; Lee, Hyundae; Wang, Yuliang
2017-08-01
This paper deals with the three-dimensional narrow escape problem in a dendritic spine shaped domain, which is composed of a relatively big head and a thin neck. The narrow escape problem is to compute the mean first passage time of Brownian particles traveling from inside the head to the end of the neck. The original model is to solve a mixed Dirichlet-Neumann boundary value problem for the Poisson equation in the composite domain, and is computationally challenging. In this paper we seek to transfer the original problem to a mixed Robin-Neumann boundary value problem by dropping the thin neck part, and rigorously derive the asymptotic expansion of the mean first passage time with high order terms. This study is a nontrivial three-dimensional generalization of the work in Li (2014 J. Phys. A: Math. Theor. 47 505202), where a two-dimensional analogue domain is considered.
Classification Order of Surface-Confined Intermixing at Epitaxial Interface
NASA Astrophysics Data System (ADS)
Michailov, M.
The self-organization phenomena at epitaxial interface hold special attention in contemporary material science. Being relevant to the fundamental physical problem of competing, long-range and short-range atomic interactions in systems with reduced dimensionality, these phenomena have found exacting academic interest. They are also of great technological importance for their ability to bring spontaneous formation of regular nanoscale surface patterns and superlattices with exotic properties. The basic phenomenon involved in this process is surface diffusion. That is the motivation behind the present study which deals with important details of diffusion scenarios that control the fine atomic structure of epitaxial interface. Consisting surface imperfections (terraces, steps, kinks, and vacancies), the interface offers variety of barriers for surface diffusion. Therefore, the adatoms and clusters need a certain critical energy to overcome the corresponding diffusion barriers. In the most general case the critical energies can be attained by variation of the system temperature. Hence, their values define temperature limits of system energy gaps associated with different diffusion scenarios. This systematization imply classification order of surface alloying: blocked, incomplete, and complete. On that background, two diffusion problems, related to the atomic-scale surface morphology, will be discussed. The first problem deals with diffusion of atomic clusters on atomically smooth interface. On flat domains, far from terraces and steps, we analyzed the impact of size, shape, and cluster/substrate lattice misfit on the diffusion behavior of atomic clusters (islands). We found that the lattice constant of small clusters depends on the number N of building atoms at 1 < N ≤ 10. In heteroepitaxy, this effect of variable lattice constant originates from the enhanced charge transfer and the strong influence of the surface potential on cluster atomic arrangement. At constant temperature, the variation of the lattice constant leads to variable misfit which affects the island migration. The cluster/substrate commensurability influences the oscillation behavior of the diffusion coefficient caused by variation in the cluster shape. We discuss the results in a physical model that implies cluster diffusion with size-dependent cluster/substrate misfit. The second problem is devoted to diffusion phenomena in the vicinity of atomic terraces on stepped or vicinal surfaces. Here, we develop a computational model that refines important details of diffusion behavior of adatoms accounting for the energy barriers at specific atomic sites (smooth domains, terraces, and steps) located on the crystal surface. The dynamic competition between energy gained by mixing and substrate strain energy results in diffusion scenario where adatoms form alloyed islands and alloyed stripes in the vicinity of terrace edges. Being in agreement with recent experimental findings, the observed effect of stripe and island alloy formation opens up a way regular surface patterns to be configured at different atomic levels on the crystal surface. The complete surface alloying of the entire interface layer is also briefly discussed with critical analysis and classification of experimental findings and simulation data.
Herrero-Herrero, María; García-Massó, Xavier; Martínez-Corralo, Carlos; Prades-Piñón, Josep; Sanchis-Alfonso, Vicente
2017-09-01
The aim of this study was to determine whether the most physically active adolescents have better lower limb control. 31 high school students (12 males and 19 females) participated in this study. The Anterior Knee Pain Scale was used to find any cases of knee pain. Only subjects with high scores were selected, to exclude those with knee pain or lower limb injuries. Single Leg Squat and Tuck Jump Assessment were used to evaluate movements with two cameras in a two-dimensional assessment. The IPAQ Questionnaire was used to score the physical activity and to classify it into MET total, MET moderate activity, MET vigorous activity and MET walking. These scores were related to knee angle at landing, age and body mass index by self-organizing maps analysis. The subjects were classified into 4 clusters and the descriptive statistics of the different clusters were determined to find any differences. The subjects in cluster 3 were classified as those with the highest risk factors of suffering lower limb musculoskeletal disorders or knee pain, even though injuries do not only depend on quality of movement. Physical activity was not related to healthy movements during jump and single leg squat. Physical activity alone cannot be an indicator of good quality lower limb movement, as the knee valgus angle plays a determining role, as it could also depend on neuromuscular control and anatomical characteristics. The analytical method described in the study could be used by physical education teachers to detect potential risk factors for musculoskeletal problems in the lower limbs, especially in the knees.
High Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Lanteri, S.; Maman, N.; Piperno, S.; Gumaste, U.
1994-01-01
In order to predict the dynamic response of a flexible structure in a fluid flow, the equations of motion of the structure and the fluid must be solved simultaneously. In this paper, we present several partitioned procedures for time-integrating this focus coupled problem and discuss their merits in terms of accuracy, stability, heterogeneous computing, I/O transfers, subcycling, and parallel processing. All theoretical results are derived for a one-dimensional piston model problem with a compressible flow, because the complete three-dimensional aeroelastic problem is difficult to analyze mathematically. However, the insight gained from the analysis of the coupled piston problem and the conclusions drawn from its numerical investigation are confirmed with the numerical simulation of the two-dimensional transient aeroelastic response of a flexible panel in a transonic nonlinear Euler flow regime.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Health-related needs of people with multiple chronic diseases: differences and underlying factors.
Hopman, Petra; Schellevis, François G; Rijken, Mieke
2016-03-01
To examine the health-related needs of people with multiple chronic diseases in the Netherlands compared to people with one chronic disease, and to identify different subgroups of multimorbid patients based on differences in their health problems. Participants were 1092 people with one or more chronic diseases of a nationwide prospective panel study on the consequences of chronic illness in the Netherlands. They completed the EQ-6D, a multi-dimensional questionnaire on health problems (October 2013). Chi-square tests and analyses of variance were performed to test for differences between multimorbid patients and patients with one chronic disease. To identify subgroups of multimorbid patients, cluster analysis was performed and differences in EQ-6D scores between clusters were tested with Chi-square tests. Multimorbid patients (51 % of the total sample) experience more problems in most health domains than patients with one chronic disease. Almost half (44 %) of the multimorbid people had many health problems in different domains. These people were more often female, had a smaller household size, had a lower health literacy, and suffered from more chronic diseases. Remarkably, a small subgroup of multimorbid patients (4 %, mostly elderly males) is characterized by all having cognitive problems. Based on the problems they experience, we conclude that patients with multimorbidity have relatively many and diverse health-related needs. Extensive health-related needs among people with multimorbidity may relate not only to the number of chronic diseases they suffer from, but also to their patient characteristics. This should be taken into account, when identifying target groups for comprehensive support programmes.
Shaffer, Patrick; Valsson, Omar; Parrinello, Michele
2016-01-01
The capabilities of molecular simulations have been greatly extended by a number of widely used enhanced sampling methods that facilitate escaping from metastable states and crossing large barriers. Despite these developments there are still many problems which remain out of reach for these methods which has led to a vigorous effort in this area. One of the most important problems that remains unsolved is sampling high-dimensional free-energy landscapes and systems that are not easily described by a small number of collective variables. In this work we demonstrate a new way to compute free-energy landscapes of high dimensionality based on the previously introduced variationally enhanced sampling, and we apply it to the miniprotein chignolin. PMID:26787868
Percolation analyses of observed and simulated galaxy clustering
NASA Astrophysics Data System (ADS)
Bhavsar, S. P.; Barrow, J. D.
1983-11-01
A percolation cluster analysis is performed on equivalent regions of the CFA redshift survey of galaxies and the 4000 body simulations of gravitational clustering made by Aarseth, Gott and Turner (1979). The observed and simulated percolation properties are compared and, unlike correlation and multiplicity function analyses, favour high density (Omega = 1) models with n = - 1 initial data. The present results show that the three-dimensional data are consistent with the degree of filamentary structure present in isothermal models of galaxy formation at the level of percolation analysis. It is also found that the percolation structure of the CFA data is a function of depth. Percolation structure does not appear to be a sensitive probe of intrinsic filamentary structure.
Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.
Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò
2018-05-15
Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.
2017-12-01
Machine-learning achieved unprecedented results in high-dimensional data processing tasks with wide applications in various fields. Due to the growing number of complex nonlinear systems that have to be investigated in science and the bare raw size of data nowadays available, ML offers the unique ability to extract knowledge, regardless the specific application field. Examples are image segmentation, supervised/unsupervised/ semi-supervised classification, feature extraction, data dimensionality analysis/reduction.The MASCS instrument has mapped Mercury surface in the 400-1145 nm wavelength range during orbital observations by the MESSENGER spacecraft. We have conducted k-means unsupervised hierarchical clustering to identify and characterize spectral units from MASCS observations. The results display a dichotomy: a polar and equatorial units, possibly linked to compositional differences or weathering due to irradiation. To explore possible relations between composition and spectral behavior, we have compared the spectral provinces with elemental abundance maps derived from MESSENGER's X-Ray Spectrometer (XRS).For the Vesta application on DAWN Visible and infrared spectrometer (VIR) data, we explored several Machine Learning techniques: image segmentation method, stream algorithm and hierarchical clustering.The algorithm successfully separates the Olivine outcrops around two craters on Vesta's surface [1]. New maps summarizing the spectral and chemical signature of the surface could be automatically produced.We conclude that instead of hand digging in data, scientist could choose a subset of algorithms with well known feature (i.e. efficacy on the particular problem, speed, accuracy) and focus their effort in understanding what important characteristic of the groups found in the data mean. [1] E Ammannito et al. "Olivine in an unexpected location on Vesta's surface". In: Nature 504.7478 (2013), pp. 122-125.
Morphology of size-selected Ptn clusters on CeO2(111)
NASA Astrophysics Data System (ADS)
Shahed, Syed Mohammad Fakruddin; Beniya, Atsushi; Hirata, Hirohito; Watanabe, Yoshihide
2018-03-01
Supported Pt catalysts and ceria are well known for their application in automotive exhaust catalysts. Size-selected Pt clusters supported on a CeO2(111) surface exhibit distinct physical and chemical properties. We investigated the morphology of the size-selected Ptn (n = 5-13) clusters on a CeO2(111) surface using scanning tunneling microscopy at room temperature. Ptn clusters prefer a two-dimensional morphology for n = 5 and a three-dimensional (3D) morphology for n ≥ 6. We further observed the preference for a 3D tri-layer structure when n ≥ 10. For each cluster size, we quantitatively estimated the relative fraction of the clusters for each type of morphology. Size-dependent morphology of the Ptn clusters on the CeO2(111) surface was attributed to the Pt-Pt interaction in the cluster and the Pt-O interaction between the cluster and CeO2(111) surface. The results obtained herein provide a clear understanding of the size-dependent morphology of the Ptn clusters on a CeO2(111) surface.
Morphology of size-selected Ptn clusters on CeO2(111).
Shahed, Syed Mohammad Fakruddin; Beniya, Atsushi; Hirata, Hirohito; Watanabe, Yoshihide
2018-03-21
Supported Pt catalysts and ceria are well known for their application in automotive exhaust catalysts. Size-selected Pt clusters supported on a CeO 2 (111) surface exhibit distinct physical and chemical properties. We investigated the morphology of the size-selected Pt n (n = 5-13) clusters on a CeO 2 (111) surface using scanning tunneling microscopy at room temperature. Pt n clusters prefer a two-dimensional morphology for n = 5 and a three-dimensional (3D) morphology for n ≥ 6. We further observed the preference for a 3D tri-layer structure when n ≥ 10. For each cluster size, we quantitatively estimated the relative fraction of the clusters for each type of morphology. Size-dependent morphology of the Pt n clusters on the CeO 2 (111) surface was attributed to the Pt-Pt interaction in the cluster and the Pt-O interaction between the cluster and CeO 2 (111) surface. The results obtained herein provide a clear understanding of the size-dependent morphology of the Pt n clusters on a CeO 2 (111) surface.
High-resolution two dimensional advective transport
Smith, P.E.; Larock, B.E.
1989-01-01
The paper describes a two-dimensional high-resolution scheme for advective transport that is based on a Eulerian-Lagrangian method with a flux limiter. The scheme is applied to the problem of pure-advection of a rotated Gaussian hill and shown to preserve the monotonicity property of the governing conservation law.
Spatiotemporal analysis of the agricultural drought risk in Heilongjiang Province, China
NASA Astrophysics Data System (ADS)
Pei, Wei; Fu, Qiang; Liu, Dong; Li, Tian-xiao; Cheng, Kun; Cui, Song
2017-06-01
Droughts are natural disasters that pose significant threats to agricultural production as well as living conditions, and a spatial-temporal difference analysis of agricultural drought risk can help determine the spatial distribution and temporal variation of the drought risk within a region. Moreover, this type of analysis can provide a theoretical basis for the identification, prevention, and mitigation of drought disasters. In this study, the overall dispersion and local aggregation of projection points were based on research by Friedman and Tukey (IEEE Trans on Computer 23:881-890, 1974). In this work, high-dimensional samples were clustered by cluster analysis. The clustering results were represented by the clustering matrix, which determined the local density in the projection index. This method avoids the problem of determining a cutoff radius. An improved projection pursuit model is proposed that combines cluster analysis and the projection pursuit model, which offer advantages for classification and assessment, respectively. The improved model was applied to analyze the agricultural drought risk of 13 cities in Heilongjiang Province over 6 years (2004, 2006, 2008, 2010, 2012, and 2014). The risk of an agricultural drought disaster was characterized by 14 indicators and the following four aspects: hazard, exposure, sensitivity, and resistance capacity. The spatial distribution and temporal variation characteristics of the agricultural drought risk in Heilongjiang Province were analyzed. The spatial distribution results indicated that Suihua, Qigihar, Daqing, Harbin, and Jiamusi are located in high-risk areas, Daxing'anling and Yichun are located in low-risk areas, and the differences among the regions were primarily caused by the aspects exposure and resistance capacity. The temporal variation results indicated that the risk of agricultural drought in most areas presented an initially increasing and then decreasing trend. A higher value for the exposure aspect increased the risk of drought, whereas a higher value for the resistance capacity aspect reduced the risk of drought. Over the long term, the exposure level of the region presented limited increases, whereas the resistance capacity presented considerable increases. Therefore, the risk of agricultural drought in Heilongjiang Province will continue to exhibit a decreasing trend.
A self-organizing Lagrangian particle method for adaptive-resolution advection-diffusion simulations
NASA Astrophysics Data System (ADS)
Reboux, Sylvain; Schrader, Birte; Sbalzarini, Ivo F.
2012-05-01
We present a novel adaptive-resolution particle method for continuous parabolic problems. In this method, particles self-organize in order to adapt to local resolution requirements. This is achieved by pseudo forces that are designed so as to guarantee that the solution is always well sampled and that no holes or clusters develop in the particle distribution. The particle sizes are locally adapted to the length scale of the solution. Differential operators are consistently evaluated on the evolving set of irregularly distributed particles of varying sizes using discretization-corrected operators. The method does not rely on any global transforms or mapping functions. After presenting the method and its error analysis, we demonstrate its capabilities and limitations on a set of two- and three-dimensional benchmark problems. These include advection-diffusion, the Burgers equation, the Buckley-Leverett five-spot problem, and curvature-driven level-set surface refinement.
Cosmic-Ray Feedback Heating of the Intracluster Medium
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruszkowski, Mateusz; Yang, H.-Y. Karen; Reynolds, Christopher S., E-mail: mateuszr@umich.edu, E-mail: hsyang@astro.umd.edu, E-mail: chris@astro.umd.edu
2017-07-20
Active galactic nuclei (AGNs) play a central role in solving the decades-old cooling-flow problem. Although there is consensus that AGNs provide the energy to prevent catastrophically large star formation, one major problem remains: How is the AGN energy thermalized in the intracluster medium (ICM)? We perform a suite of three-dimensional magnetohydrodynamical adaptive mesh refinement simulations of AGN feedback in a cool core cluster including cosmic rays (CRs). CRs are supplied to the ICM via collimated AGN jets and subsequently disperse in the magnetized ICM via streaming, and interact with the ICM via hadronic, Coulomb, and streaming instability heating. We findmore » that CR transport is an essential model ingredient at least within the context of the physical model considered here. When streaming is included, (i) CRs come into contact with the ambient ICM and efficiently heat it, (ii) streaming instability heating dominates over Coulomb and hadronic heating, (iii) the AGN is variable and the atmosphere goes through low-/high-velocity dispersion cycles, and, importantly, (iv) CR pressure support in the cool core is very low and does not demonstrably violate observational constraints. However, when streaming is ignored, CR energy is not efficiently spent on the ICM heating and CR pressure builds up to a significant level, creating tension with the observations. Overall, we demonstrate that CR heating is a viable channel for the AGN energy thermalization in clusters and likely also in ellipticals, and that CRs play an important role in determining AGN intermittency and the dynamical state of cool cores.« less
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.; Henson, Robert
2012-01-01
A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…
A direct method for the solution of unsteady two-dimensional incompressible Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Ghia, K. N.; Osswald, G. A.; Ghia, U.
1983-01-01
The unsteady incompressible Navier-Stokes equations are formulated in terms of vorticity and stream function in generalized curvilinear orthogonal coordinates to facilitiate analysis of flow configurations with general geometries. The numerical method developed solves the conservative form of the transport equation using the alternating-direction implicit method, whereas the stream-function equation is solved by direct block Gaussian elimination. The method is applied to a model problem of flow over a back-step in a doubly infinite channel, using clustered conformal coordinates. One-dimensional stretching functions, dependent on the Reynolds number and the asymptotic behavior of the flow, are used to provide suitable grid distribution in the separation and reattachment regions, as well as in the inflow and outflow regions. The optimum grid distribution selected attempts to honor the multiple length scales of the separated-flow model problem. The asymptotic behavior of the finite-differenced transport equation near infinity is examined and the numerical method is carefully developed so as to lead to spatially second-order accurate wiggle-free solutions, i.e., with minimum dispersive error. Results have been obtained in the entire laminar range for the backstep channel and are in good agreement with the available experimental data for this flow problem.
[Application Progress of Three-dimensional Laser Scanning Technology in Medical Surface Mapping].
Zhang, Yonghong; Hou, He; Han, Yuchuan; Wang, Ning; Zhang, Ying; Zhu, Xianfeng; Wang, Mingshi
2016-04-01
The booming three-dimensional laser scanning technology can efficiently and effectively get spatial three-dimensional coordinates of the detected object surface and reconstruct the image at high speed,high precision and large capacity of information.Non-radiation,non-contact and the ability of visualization make it increasingly popular in three-dimensional surface medical mapping.This paper reviews the applications and developments of three-dimensional laser scanning technology in medical field,especially in stomatology,plastic surgery and orthopedics.Furthermore,the paper also discusses the application prospects in the future as well as the biomedical engineering problems it would encounter with.
[Autism Spectrum Disorder and DSM-5: Spectrum or Cluster?].
Kienle, Xaver; Freiberger, Verena; Greulich, Heide; Blank, Rainer
2015-01-01
Within the new DSM-5, the currently differentiated subgroups of "Autistic Disorder" (299.0), "Asperger's Disorder" (299.80) and "Pervasive Developmental Disorder" (299.80) are replaced by the more general "Autism Spectrum Disorder". With regard to a patient-oriented and expedient advising therapy planning, however, the issue of an empirically reproducible and clinically feasible differentiation into subgroups must still be raised. Based on two Autism-rating-scales (ASDS and FSK), an exploratory two-step cluster analysis was conducted with N=103 children (age: 5-18) seen in our social-pediatric health care centre to examine potentially autistic symptoms. In the two-cluster solution of both rating scales, mainly the problems in social communication grouped the children into a cluster "with communication problems" (51 % and 41 %), and a cluster "without communication problems". Within the three-cluster solution of the ASDS, sensory hypersensitivity, cleaving to routines and social-communicative problems generated an "autistic" subgroup (22%). The children of the second cluster ("communication problems", 35%) were only described by social-communicative problems, and the third group did not show any problems (38%). In the three-cluster solution of the FSK, the "autistic cluster" of the two-cluster solution differentiated in a subgroup with mainly social-communicative problems (cluster 1) and a second subgroup described by restrictive, repetitive behavior. The different cluster solutions will be discussed with a view to the new DSM-5 diagnostic criteria, for following studies a further specification of some of the ASDS and FSK items could be helpful.
NASA Astrophysics Data System (ADS)
López-López, J. M.; Moncho-Jordá, A.; Schmitt, A.; Hidalgo-Álvarez, R.
2005-09-01
Binary diffusion-limited cluster-cluster aggregation processes are studied as a function of the relative concentration of the two species. Both, short and long time behaviors are investigated by means of three-dimensional off-lattice Brownian Dynamics simulations. At short aggregation times, the validity of the Hogg-Healy-Fuerstenau approximation is shown. At long times, a single large cluster containing all initial particles is found to be formed when the relative concentration of the minority particles lies above a critical value. Below that value, stable aggregates remain in the system. These stable aggregates are composed by a few minority particles that are highly covered by majority ones. Our off-lattice simulations reveal a value of approximately 0.15 for the critical relative concentration. A qualitative explanation scheme for the formation and growth of the stable aggregates is developed. The simulations also explain the phenomenon of monomer discrimination that was observed recently in single cluster light scattering experiments.
Craig, Hugh; Berretta, Regina; Moscato, Pablo
2016-01-01
In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416
An adaptive ANOVA-based PCKF for high-dimensional nonlinear inverse modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Weixuan, E-mail: weixuan.li@usc.edu; Lin, Guang, E-mail: guang.lin@pnnl.gov; Zhang, Dongxiao, E-mail: dxz@pku.edu.cn
2014-02-01
The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect—except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos basis functions in the expansion helps to capture uncertainty more accurately but increases computational cost. Selection of basis functionsmore » is particularly important for high-dimensional stochastic problems because the number of polynomial chaos basis functions required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE basis functions are pre-set based on users' experience. Also, for sequential data assimilation problems, the basis functions kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE basis functions for different problems and automatically adjusts the number of basis functions in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm was tested with different examples and demonstrated great effectiveness in comparison with non-adaptive PCKF and EnKF algorithms.« less
An Adaptive ANOVA-based PCKF for High-Dimensional Nonlinear Inverse Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
LI, Weixuan; Lin, Guang; Zhang, Dongxiao
2014-02-01
The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect—except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos bases in the expansion helps to capture uncertainty more accurately but increases computational cost. Bases selection is particularly importantmore » for high-dimensional stochastic problems because the number of polynomial chaos bases required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE bases are pre-set based on users’ experience. Also, for sequential data assimilation problems, the bases kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE bases for different problems and automatically adjusts the number of bases in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm is tested with different examples and demonstrated great effectiveness in comparison with non-adaptive PCKF and EnKF algorithms.« less
Tuo, Shouheng; Yong, Longquan; Deng, Fang’an; Li, Yanhai; Lin, Yong; Lu, Qiuju
2017-01-01
Harmony Search (HS) and Teaching-Learning-Based Optimization (TLBO) as new swarm intelligent optimization algorithms have received much attention in recent years. Both of them have shown outstanding performance for solving NP-Hard optimization problems. However, they also suffer dramatic performance degradation for some complex high-dimensional optimization problems. Through a lot of experiments, we find that the HS and TLBO have strong complementarity each other. The HS has strong global exploration power but low convergence speed. Reversely, the TLBO has much fast convergence speed but it is easily trapped into local search. In this work, we propose a hybrid search algorithm named HSTLBO that merges the two algorithms together for synergistically solving complex optimization problems using a self-adaptive selection strategy. In the HSTLBO, both HS and TLBO are modified with the aim of balancing the global exploration and exploitation abilities, where the HS aims mainly to explore the unknown regions and the TLBO aims to rapidly exploit high-precision solutions in the known regions. Our experimental results demonstrate better performance and faster speed than five state-of-the-art HS variants and show better exploration power than five good TLBO variants with similar run time, which illustrates that our method is promising in solving complex high-dimensional optimization problems. The experiment on portfolio optimization problems also demonstrate that the HSTLBO is effective in solving complex read-world application. PMID:28403224
Tuo, Shouheng; Yong, Longquan; Deng, Fang'an; Li, Yanhai; Lin, Yong; Lu, Qiuju
2017-01-01
Harmony Search (HS) and Teaching-Learning-Based Optimization (TLBO) as new swarm intelligent optimization algorithms have received much attention in recent years. Both of them have shown outstanding performance for solving NP-Hard optimization problems. However, they also suffer dramatic performance degradation for some complex high-dimensional optimization problems. Through a lot of experiments, we find that the HS and TLBO have strong complementarity each other. The HS has strong global exploration power but low convergence speed. Reversely, the TLBO has much fast convergence speed but it is easily trapped into local search. In this work, we propose a hybrid search algorithm named HSTLBO that merges the two algorithms together for synergistically solving complex optimization problems using a self-adaptive selection strategy. In the HSTLBO, both HS and TLBO are modified with the aim of balancing the global exploration and exploitation abilities, where the HS aims mainly to explore the unknown regions and the TLBO aims to rapidly exploit high-precision solutions in the known regions. Our experimental results demonstrate better performance and faster speed than five state-of-the-art HS variants and show better exploration power than five good TLBO variants with similar run time, which illustrates that our method is promising in solving complex high-dimensional optimization problems. The experiment on portfolio optimization problems also demonstrate that the HSTLBO is effective in solving complex read-world application.
A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
Song, Ge
2014-01-01
Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models. PMID:24982961
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Jarman, Kristin H.; Harvey, Scott D.
2005-05-28
A fundamental problem in analysis of highly multivariate spectral or chromatographic data is reduction of dimensionality. Principal components analysis (PCA), concerned with explaining the variance-covariance structure of the data, is a commonly used approach to dimension reduction. Recently an attractive alternative to PCA, sequential projection pursuit (SPP), has been introduced. Designed to elicit clustering tendencies in the data, SPP may be more appropriate when performing clustering or classification analysis. However, the existing genetic algorithm (GA) implementation of SPP has two shortcomings, computation time and inability to determine the number of factors necessary to explain the majority of the structure inmore » the data. We address both these shortcomings. First, we introduce a new SPP algorithm, a random scan sampling algorithm (RSSA), that significantly reduces computation time. We compare the computational burden of the RSS and GA implementation for SPP on a dataset containing Raman spectra of twelve organic compounds. Second, we propose a Bayes factor criterion, BFC, as an effective measure for selecting the number of factors needed to explain the majority of the structure in the data. We compare SPP to PCA on two datasets varying in type, size, and difficulty; in both cases SPP achieves a higher accuracy with a lower number of latent variables.« less
Data Mining Technologies Inspired from Visual Principle
NASA Astrophysics Data System (ADS)
Xu, Zongben
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber's law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
Single exposure three-dimensional imaging of dusty plasma clusters.
Hartmann, Peter; Donkó, István; Donkó, Zoltán
2013-02-01
We have worked out the details of a single camera, single exposure method to perform three-dimensional imaging of a finite particle cluster. The procedure is based on the plenoptic imaging principle and utilizes a commercial Lytro light field still camera. We demonstrate the capabilities of our technique on a single layer particle cluster in a dusty plasma, where the camera is aligned and inclined at a small angle to the particle layer. The reconstruction of the third coordinate (depth) is found to be accurate and even shadowing particles can be identified.
NASA Astrophysics Data System (ADS)
Bucheli, D.; Caprara, S.; Castellani, C.; Grilli, M.
2013-02-01
Motivated by recent experimental data on thin film superconductors and oxide interfaces, we propose a random-resistor network apt to describe the occurrence of a metal-superconductor transition in a two-dimensional electron system with disorder on the mesoscopic scale. We consider low-dimensional (e.g. filamentary) structures of a superconducting cluster embedded in the two-dimensional network and we explore the separate effects and the interplay of the superconducting structure and of the statistical distribution of local critical temperatures. The thermal evolution of the resistivity is determined by a numerical calculation of the random-resistor network and, for comparison, a mean-field approach called effective medium theory (EMT). Our calculations reveal the relevance of the distribution of critical temperatures for clusters with low connectivity. In addition, we show that the presence of spatial correlations requires a modification of standard EMT to give qualitative agreement with the numerical results. Applying the present approach to an LaTiO3/SrTiO3 oxide interface, we find that the measured resistivity curves are compatible with a network of spatially dense but loosely connected superconducting islands.
A Selective Overview of Variable Selection in High Dimensional Feature Space
Fan, Jianqing
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
Deep linear autoencoder and patch clustering-based unified one-dimensional coding of image and video
NASA Astrophysics Data System (ADS)
Li, Honggui
2017-09-01
This paper proposes a unified one-dimensional (1-D) coding framework of image and video, which depends on deep learning neural network and image patch clustering. First, an improved K-means clustering algorithm for image patches is employed to obtain the compact inputs of deep artificial neural network. Second, for the purpose of best reconstructing original image patches, deep linear autoencoder (DLA), a linear version of the classical deep nonlinear autoencoder, is introduced to achieve the 1-D representation of image blocks. Under the circumstances of 1-D representation, DLA is capable of attaining zero reconstruction error, which is impossible for the classical nonlinear dimensionality reduction methods. Third, a unified 1-D coding infrastructure for image, intraframe, interframe, multiview video, three-dimensional (3-D) video, and multiview 3-D video is built by incorporating different categories of videos into the inputs of patch clustering algorithm. Finally, it is shown in the results of simulation experiments that the proposed methods can simultaneously gain higher compression ratio and peak signal-to-noise ratio than those of the state-of-the-art methods in the situation of low bitrate transmission.
Numerical aerodynamic simulation facility. [for flows about three-dimensional configurations
NASA Technical Reports Server (NTRS)
Bailey, F. R.; Hathaway, A. W.
1978-01-01
Critical to the advancement of computational aerodynamics capability is the ability to simulate flows about three-dimensional configurations that contain both compressible and viscous effects, including turbulence and flow separation at high Reynolds numbers. Analyses were conducted of two solution techniques for solving the Reynolds averaged Navier-Stokes equations describing the mean motion of a turbulent flow with certain terms involving the transport of turbulent momentum and energy modeled by auxiliary equations. The first solution technique is an implicit approximate factorization finite-difference scheme applied to three-dimensional flows that avoids the restrictive stability conditions when small grid spacing is used. The approximate factorization reduces the solution process to a sequence of three one-dimensional problems with easily inverted matrices. The second technique is a hybrid explicit/implicit finite-difference scheme which is also factored and applied to three-dimensional flows. Both methods are applicable to problems with highly distorted grids and a variety of boundary conditions and turbulence models.
Atlas-guided cluster analysis of large tractography datasets.
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
NASA Astrophysics Data System (ADS)
Yannouleas, Constantine; Brandt, Benedikt B.; Landman, Uzi
2016-07-01
Advances with trapped ultracold atoms intensified interest in simulating complex physical phenomena, including quantum magnetism and transitions from itinerant to non-itinerant behavior. Here we show formation of antiferromagnetic ground states of few ultracold fermionic atoms in single and double well (DW) traps, through microscopic Hamiltonian exact diagonalization for two DW arrangements: (i) two linearly oriented one-dimensional, 1D, wells, and (ii) two coupled parallel wells, forming a trap of two-dimensional, 2D, nature. The spectra and spin-resolved conditional probabilities reveal for both cases, under strong repulsion, atomic spatial localization at extemporaneously created sites, forming quantum molecular magnetic structures with non-itinerant character. These findings usher future theoretical and experimental explorations into the highly correlated behavior of ultracold strongly repelling fermionic atoms in higher dimensions, beyond the fermionization physics that is strictly applicable only in the 1D case. The results for four atoms are well described with finite Heisenberg spin-chain and cluster models. The numerical simulations of three fermionic atoms in symmetric DWs reveal the emergent appearance of coupled resonating 2D Heisenberg clusters, whose emulation requires the use of a t-J-like model, akin to that used in investigations of high T c superconductivity. The highly entangled states discovered in the microscopic and model calculations of controllably detuned, asymmetric, DWs suggest three-cold-atom DW quantum computing qubits.
NASA Astrophysics Data System (ADS)
Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik
2018-03-01
The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol-1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.
Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik
2018-03-14
The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol -1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.
Modal Ring Method for the Scattering of Electromagnetic Waves
NASA Technical Reports Server (NTRS)
Baumeister, Kenneth J.; Kreider, Kevin L.
1993-01-01
The modal ring method for electromagnetic scattering from perfectly electric conducting (PEC) symmetrical bodies is presented. The scattering body is represented by a line of finite elements (triangular) on its outer surface. The infinite computational region surrounding the body is represented analytically by an eigenfunction expansion. The modal ring method effectively reduces the two dimensional scattering problem to a one-dimensional problem similar to the method of moments. The modal element method is capable of handling very high frequency scattering because it has a highly banded solution matrix.
Iterative Stable Alignment and Clustering of 2D Transmission Electron Microscope Images
Yang, Zhengfan; Fang, Jia; Chittuluru, Johnathan; Asturias, Francisco J.; Penczek, Pawel A.
2012-01-01
SUMMARY Identification of homogeneous subsets of images in a macromolecular electron microscopy (EM) image data set is a critical step in single-particle analysis. The task is handled by iterative algorithms, whose performance is compromised by the compounded limitations of image alignment and K-means clustering. Here we describe an approach, iterative stable alignment and clustering (ISAC) that, relying on a new clustering method and on the concepts of stability and reproducibility, can extract validated, homogeneous subsets of images. ISAC requires only a small number of simple parameters and, with minimal human intervention, can eliminate bias from two-dimensional image clustering and maximize the quality of group averages that can be used for ab initio three-dimensional structural determination and analysis of macromolecular conformational variability. Repeated testing of the stability and reproducibility of a solution within ISAC eliminates heterogeneous or incorrect classes and introduces critical validation to the process of EM image clustering. PMID:22325773
NASA Astrophysics Data System (ADS)
Ghattas, O.; Petra, N.; Cui, T.; Marzouk, Y.; Benjamin, P.; Willcox, K.
2016-12-01
Model-based projections of the dynamics of the polar ice sheets play a central role in anticipating future sea level rise. However, a number of mathematical and computational challenges place significant barriers on improving predictability of these models. One such challenge is caused by the unknown model parameters (e.g., in the basal boundary conditions) that must be inferred from heterogeneous observational data, leading to an ill-posed inverse problem and the need to quantify uncertainties in its solution. In this talk we discuss the problem of estimating the uncertainty in the solution of (large-scale) ice sheet inverse problems within the framework of Bayesian inference. Computing the general solution of the inverse problem--i.e., the posterior probability density--is intractable with current methods on today's computers, due to the expense of solving the forward model (3D full Stokes flow with nonlinear rheology) and the high dimensionality of the uncertain parameters (which are discretizations of the basal sliding coefficient field). To overcome these twin computational challenges, it is essential to exploit problem structure (e.g., sensitivity of the data to parameters, the smoothing property of the forward model, and correlations in the prior). To this end, we present a data-informed approach that identifies low-dimensional structure in both parameter space and the forward model state space. This approach exploits the fact that the observations inform only a low-dimensional parameter space and allows us to construct a parameter-reduced posterior. Sampling this parameter-reduced posterior still requires multiple evaluations of the forward problem, therefore we also aim to identify a low dimensional state space to reduce the computational cost. To this end, we apply a proper orthogonal decomposition (POD) approach to approximate the state using a low-dimensional manifold constructed using ``snapshots'' from the parameter reduced posterior, and the discrete empirical interpolation method (DEIM) to approximate the nonlinearity in the forward problem. We show that using only a limited number of forward solves, the resulting subspaces lead to an efficient method to explore the high-dimensional posterior.
High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster
NASA Astrophysics Data System (ADS)
Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku
2015-01-01
High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.
NASA Astrophysics Data System (ADS)
Akinin, M. V.; Akinina, N. V.; Klochkov, A. Y.; Nikiforov, M. B.; Sokolova, A. V.
2015-05-01
The report reviewed the algorithm fuzzy c-means, performs image segmentation, give an estimate of the quality of his work on the criterion of Xie-Beni, contain the results of experimental studies of the algorithm in the context of solving the problem of drawing up detailed two-dimensional maps with the use of unmanned aerial vehicles. According to the results of the experiment concluded that the possibility of applying the algorithm in problems of decoding images obtained as a result of aerial photography. The considered algorithm can significantly break the original image into a plurality of segments (clusters) in a relatively short period of time, which is achieved by modification of the original k-means algorithm to work in a fuzzy task.
On Multi-Dimensional Unstructured Mesh Adaption
NASA Technical Reports Server (NTRS)
Wood, William A.; Kleb, William L.
1999-01-01
Anisotropic unstructured mesh adaption is developed for a truly multi-dimensional upwind fluctuation splitting scheme, as applied to scalar advection-diffusion. The adaption is performed locally using edge swapping, point insertion/deletion, and nodal displacements. Comparisons are made versus the current state of the art for aggressive anisotropic unstructured adaption, which is based on a posteriori error estimates. Demonstration of both schemes to model problems, with features representative of compressible gas dynamics, show the present method to be superior to the a posteriori adaption for linear advection. The performance of the two methods is more similar when applied to nonlinear advection, with a difference in the treatment of shocks. The a posteriori adaption can excessively cluster points to a shock, while the present multi-dimensional scheme tends to merely align with a shock, using fewer nodes. As a consequence of this alignment tendency, an implementation of eigenvalue limiting for the suppression of expansion shocks is developed for the multi-dimensional distribution scheme. The differences in the treatment of shocks by the adaption schemes, along with the inherently low levels of artificial dissipation in the fluctuation splitting solver, suggest the present method is a strong candidate for applications to compressible gas dynamics.
A local search for a graph clustering problem
NASA Astrophysics Data System (ADS)
Navrotskaya, Anna; Il'ev, Victor
2016-10-01
In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
Robust MST-Based Clustering Algorithm.
Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing
2018-06-01
Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
Zone-Based Routing Protocol for Wireless Sensor Networks
Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran
2014-01-01
Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads. PMID:27437455
Zone-Based Routing Protocol for Wireless Sensor Networks.
Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran
2014-01-01
Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads.
NASA Astrophysics Data System (ADS)
Qian, Elaine A.; Wixtrom, Alex I.; Axtell, Jonathan C.; Saebi, Azin; Jung, Dahee; Rehak, Pavel; Han, Yanxiao; Moully, Elamar Hakim; Mosallaei, Daniel; Chow, Sylvia; Messina, Marco S.; Wang, Jing Yang; Royappa, A. Timothy; Rheingold, Arnold L.; Maynard, Heather D.; Král, Petr; Spokoyny, Alexander M.
2017-04-01
The majority of biomolecules are intrinsically atomically precise, an important characteristic that enables rational engineering of their recognition and binding properties. However, imparting a similar precision to hybrid nanoparticles has been challenging because of the inherent limitations of existing chemical methods and building blocks. Here we report a new approach to form atomically precise and highly tunable hybrid nanomolecules with well-defined three-dimensionality. Perfunctionalization of atomically precise clusters with pentafluoroaryl-terminated linkers produces size-tunable rigid cluster nanomolecules. These species are amenable to facile modification with a variety of thiol-containing molecules and macromolecules. Assembly proceeds at room temperature within hours under mild conditions, and the resulting nanomolecules exhibit high stabilities because of their full covalency. We further demonstrate how these nanomolecules grafted with saccharides can exhibit dramatically improved binding affinity towards a protein. Ultimately, the developed strategy allows the rapid generation of precise molecular assemblies to investigate multivalent interactions.
Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.
Vera, J Fernando; Macías, Rodrigo
2017-06-01
One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.
Asymptotics of empirical eigenstructure for high dimensional spiked covariance.
Wang, Weichen; Fan, Jianqing
2017-06-01
We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.
Asymptotics of empirical eigenstructure for high dimensional spiked covariance
Wang, Weichen
2017-01-01
We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies. PMID:28835726
Atomic clusters and atomic surfaces in icosahedral quasicrystals.
Quiquandon, Marianne; Portier, Richard; Gratias, Denis
2014-05-01
This paper presents the basic tools commonly used to describe the atomic structures of quasicrystals with a specific focus on the icosahedral phases. After a brief recall of the main properties of quasiperiodic objects, two simple physical rules are discussed that lead one to eventually obtain a surprisingly small number of atomic structures as ideal quasiperiodic models for real quasicrystals. This is due to the fact that the atomic surfaces (ASs) used to describe all known icosahedral phases are located on high-symmetry special points in six-dimensional space. The first rule is maximizing the density using simple polyhedral ASs that leads to two possible sets of ASs according to the value of the six-dimensional lattice parameter A between 0.63 and 0.79 nm. The second rule is maximizing the number of complete orbits of high symmetry to construct as large as possible atomic clusters similar to those observed in complex intermetallic structures and approximant phases. The practical use of these two rules together is demonstrated on two typical examples of icosahedral phases, i-AlMnSi and i-CdRE (RE = Gd, Ho, Tm).
NASA Astrophysics Data System (ADS)
Franck, I. M.; Koutsourelakis, P. S.
2017-01-01
This paper is concerned with the numerical solution of model-based, Bayesian inverse problems. We are particularly interested in cases where the cost of each likelihood evaluation (forward-model call) is expensive and the number of unknown (latent) variables is high. This is the setting in many problems in computational physics where forward models with nonlinear PDEs are used and the parameters to be calibrated involve spatio-temporarily varying coefficients, which upon discretization give rise to a high-dimensional vector of unknowns. One of the consequences of the well-documented ill-posedness of inverse problems is the possibility of multiple solutions. While such information is contained in the posterior density in Bayesian formulations, the discovery of a single mode, let alone multiple, poses a formidable computational task. The goal of the present paper is two-fold. On one hand, we propose approximate, adaptive inference strategies using mixture densities to capture multi-modal posteriors. On the other, we extend our work in [1] with regard to effective dimensionality reduction techniques that reveal low-dimensional subspaces where the posterior variance is mostly concentrated. We validate the proposed model by employing Importance Sampling which confirms that the bias introduced is small and can be efficiently corrected if the analyst wishes to do so. We demonstrate the performance of the proposed strategy in nonlinear elastography where the identification of the mechanical properties of biological materials can inform non-invasive, medical diagnosis. The discovery of multiple modes (solutions) in such problems is critical in achieving the diagnostic objectives.
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
Nagaoka, Tomoaki; Watanabe, Soichi
2012-01-01
Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
Liu, Lichen; Díaz, Urbano; Arenal, Raul; Agostini, Giovanni; Concepción, Patricia; Corma, Avelino
2017-01-01
Single metal atoms and metal clusters have attracted much attention thanks to their advantageous capabilities as heterogeneous catalysts. However, the generation of stable single atoms and clusters on a solid support is still challenging. Herein, we report a new strategy for the generation of single Pt atoms and Pt clusters with exceptionally high thermal stability, formed within purely siliceous MCM-22 during the growth of a two-dimensional zeolite into three dimensions. These subnanometric Pt species are stabilized by MCM-22, even after treatment in air up to 540 °C. Furthermore, these stable Pt species confined within internal framework cavities show size-selective catalysis for the hydrogenation of alkenes. High-temperature oxidation-reduction treatments result in the growth of encapsulated Pt species to small nanoparticles in the approximate size range of 1 to 2 nm. The stability and catalytic activity of encapsulated Pt species is also reflected in the dehydrogenation of propane to propylene.
The resistance of an n-dimensional tetrahedron
NASA Astrophysics Data System (ADS)
Griffiths, Martin
2013-01-01
We consider here a problem that is suitable for introducing high-school students to the notion of generalizing shapes and solids to n dimensions. In particular, we calculate the effective resistance between any two vertices of an n-dimensional tetrahedron whose edges are each 1-Ω resistors. This leads, in a natural way, to more demanding problems, and indeed ideas for more advanced work in this area are also suggested.
The Vainshtein mechanism in the cosmic web
DOE Office of Scientific and Technical Information (OSTI.GOV)
Falck, Bridget; Koyama, Kazuya; Zhao, Gong-bo
We investigate the dependence of the Vainshtein screening mechanism on the cosmic web morphology of both dark matter particles and halos as determined by ORIGAMI. Unlike chameleon and symmetron screening, which come into effect in regions of high density, Vainshtein screening instead depends on the dimensionality of the system, and screened bodies can still feel external fields. ORIGAMI is well-suited to this problem because it defines morphologies according to the dimensionality of the collapsing structure and does not depend on a smoothing scale or density threshold parameter. We find that halo particles are screened while filament, wall, and void particlesmore » are unscreened, and this is independent of the particle density. However, after separating halos according to their large scale cosmic web environment, we find no difference in the screening properties of halos in filaments versus halos in clusters. We find that the fifth force enhancement of dark matter particles in halos is greatest well outside the virial radius. We confirm the theoretical expectation that even if the internal field is suppressed by the Vainshtein mechanism, the object still feels the fifth force generated by the external fields, by measuring peculiar velocities and velocity dispersions of halos. Finally, we investigate the morphology and gravity model dependence of halo spins, concentrations, and shapes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.
Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,”more » has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer« less
Kimizuka, Hajime; Kurokawa, Shu; Yamaguchi, Akihiro; Sakai, Akira; Ogata, Shigenobu
2014-01-01
Predicting the equilibrium ordered structures at internal interfaces, especially in the case of nanometer-scale chemical heterogeneities, is an ongoing challenge in materials science. In this study, we established an ab-initio coarse-grained modeling technique for describing the phase-like behavior of a close-packed stacking-fault-type interface containing solute nanoclusters, which undergo a two-dimensional disorder-order transition, depending on the temperature and composition. Notably, this approach can predict the two-dimensional medium-range ordering in the nanocluster arrays realized in Mg-based alloys, in a manner consistent with scanning tunneling microscopy-based measurements. We predicted that the repulsively interacting solute-cluster system undergoes a continuous evolution into a highly ordered densely packed morphology while maintaining a high degree of six-fold orientational order, which is attributable mainly to an entropic effect. The uncovered interaction-dependent ordering properties may be useful for the design of nanostructured materials utilizing the self-organization of two-dimensional nanocluster arrays in the close-packed interfaces. PMID:25471232
Three-Dimensional Anisotropic Acoustic and Elastic Full-Waveform Seismic Inversion
NASA Astrophysics Data System (ADS)
Warner, M.; Morgan, J. V.
2013-12-01
Three-dimensional full-waveform inversion is a high-resolution, high-fidelity, quantitative, seismic imaging technique that has advanced rapidly within the oil and gas industry. The method involves the iterative improvement of a starting model using a series of local linearized updates to solve the full non-linear inversion problem. During the inversion, forward modeling employs the full two-way three-dimensional heterogeneous anisotropic acoustic or elastic wave equation to predict the observed raw field data, wiggle-for-wiggle, trace-by-trace. The method is computationally demanding; it is highly parallelized, and runs on large multi-core multi-node clusters. Here, we demonstrate what can be achieved by applying this newly practical technique to several high-density 3D seismic datasets that were acquired to image four contrasting sedimentary targets: a gas cloud above an oil reservoir, a radially faulted dome, buried fluvial channels, and collapse structures overlying an evaporate sequence. We show that the resulting anisotropic p-wave velocity models match in situ measurements in deep boreholes, reproduce detailed structure observed independently on high-resolution seismic reflection sections, accurately predict the raw seismic data, simplify and sharpen reverse-time-migrated reflection images of deeper horizons, and flatten Kirchhoff-migrated common-image gathers. We also show that full-elastic 3D full-waveform inversion of pure pressure data can generate a reasonable shear-wave velocity model for one of these datasets. For two of the four datasets, the inclusion of significant transversely isotropic anisotropy with a vertical axis of symmetry was necessary in order to fit the kinematics of the field data properly. For the faulted dome, the full-waveform-inversion p-wave velocity model recovers the detailed structure of every fault that can be seen on coincident seismic reflection data. Some of the individual faults represent high-velocity zones, some represent low-velocity zones, some have more-complex internal structure, and some are visible merely as offsets between two regions with contrasting velocity. Although this has not yet been demonstrated quantitatively for this dataset, it seems likely that at least some of this fine structure in the recovered velocity model is related to the detailed lithology, strain history and fluid properties within the individual faults. We have here applied this technique to seismic data that were acquired by the extractive industries, however this inversion scheme is immediately scalable and applicable to a much wider range of problems given sufficient quality and density of observed data. Potential targets range from shallow magma chambers beneath active volcanoes, through whole-crustal sections across plate boundaries, to regional and whole-Earth models.
Pentsak, E. O.; Kashin, A. S.; Polynski, M. V.; Kvashnina, K. O.; Glatzel, P.
2015-01-01
Gaining insight into Pd/C catalytic systems aimed at locating reactive centers on carbon surfaces, revealing their properties and estimating the number of reactive centers presents a challenging problem. In the present study state-of-the-art experimental techniques involving ultra high resolution SEM/STEM microscopy (1 Å resolution), high brilliance X-ray absorption spectroscopy and theoretical calculations on truly nanoscale systems were utilized to reveal the role of carbon centers in the formation and nature of Pd/C catalytic materials. Generation of Pd clusters in solution from the easily available Pd2dba3 precursor and the unique reactivity of the Pd clusters opened an excellent opportunity to develop an efficient procedure for the imaging of a carbon surface. Defect sites and reactivity centers of a carbon surface were mapped in three-dimensional space with high resolution and excellent contrast using a user-friendly nanoscale imaging procedure. The proposed imaging approach takes advantage of the specific interactions of reactive carbon centers with Pd clusters, which allows spatial information about chemical reactivity across the Pd/C system to be obtained using a microscopy technique. Mapping the reactivity centers with Pd markers provided unique information about the reactivity of the graphene layers and showed that >2000 reactive centers can be located per 1 μm2 of the surface area of the carbon material. A computational study at a PBE-D3-GPW level differentiated the relative affinity of the Pd2 species to the reactive centers of graphene. These findings emphasized the spatial complexity of the carbon material at the nanoscale and indicated the importance of the surface defect nature, which exhibited substantial gradients and variations across the surface area. The findings show the crucial role of the structure of the carbon support, which governs the formation of Pd/C systems and their catalytic activity. PMID:29511504
Parsons, Jeffrey T; Rendina, H Jonathon; Ventuneac, Ana; Cook, Karon F; Grov, Christian; Mustanski, Brian
2013-12-01
The Hypersexual Disorder Screening Inventory (HDSI) was designed as an instrument for the screening of hypersexuality by the American Psychiatric Association's taskforce for the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders. Our study sought to conduct a psychometric analysis of the HDSI, including an investigation of its underlying structure and reliability utilizing item response theory (IRT) modeling, and an examination of its polythetic scoring criteria in comparison to a standard dimensionally based cutoff score. We examined a diverse group of 202 highly sexually active gay and bisexual men in New York City. We conducted psychometric analyses of the HDSI, including both confirmatory factor analysis of its structure and IRT analysis of the item and scale reliabilities. We utilized the HDSI. The HDSI adequately fit a single-factor solution, although there was evidence that two of the items may measure a second factor that taps into sex as a form of coping. The scale showed evidence of strong reliability across much of the continuum of hypersexuality, and results suggested that, in addition to the proposed polythetic scoring criteria, a cutoff score of 20 on the severity index might be used for preliminary classification of HD. The HDSI was found to be highly reliable, and results suggested that a unidimensional, quantitative conception of hypersexuality with a clinically relevant cutoff score may be more appropriate than a qualitative syndrome comprised of multiple distinct clusters of problems. However, we also found preliminary evidence that three clusters of symptoms may constitute an HD syndrome as opposed to the two clusters initially proposed. Future research is needed to determine which of these issues are characteristic of the hypersexuality and HD constructs themselves and which are more likely to be methodological artifacts of the HDSI. © 2013 International Society for Sexual Medicine.
Detection of Subtle Context-Dependent Model Inaccuracies in High-Dimensional Robot Domains.
Mendoza, Juan Pablo; Simmons, Reid; Veloso, Manuela
2016-12-01
Autonomous robots often rely on models of their sensing and actions for intelligent decision making. However, when operating in unconstrained environments, the complexity of the world makes it infeasible to create models that are accurate in every situation. This article addresses the problem of using potentially large and high-dimensional sets of robot execution data to detect situations in which a robot model is inaccurate-that is, detecting context-dependent model inaccuracies in a high-dimensional context space. To find inaccuracies tractably, the robot conducts an informed search through low-dimensional projections of execution data to find parametric Regions of Inaccurate Modeling (RIMs). Empirical evidence from two robot domains shows that this approach significantly enhances the detection power of existing RIM-detection algorithms in high-dimensional spaces.
Power System Decomposition for Practical Implementation of Bulk-Grid Voltage Control Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vallem, Mallikarjuna R.; Vyakaranam, Bharat GNVSR; Holzer, Jesse T.
Power system algorithms such as AC optimal power flow and coordinated volt/var control of the bulk power system are computationally intensive and become difficult to solve in operational time frames. The computational time required to run these algorithms increases exponentially as the size of the power system increases. The solution time for multiple subsystems is less than that for solving the entire system simultaneously, and the local nature of the voltage problem lends itself to such decomposition. This paper describes an algorithm that can be used to perform power system decomposition from the point of view of the voltage controlmore » problem. Our approach takes advantage of the dominant localized effect of voltage control and is based on clustering buses according to the electrical distances between them. One of the contributions of the paper is to use multidimensional scaling to compute n-dimensional Euclidean coordinates for each bus based on electrical distance to perform algorithms like K-means clustering. A simple coordinated reactive power control of photovoltaic inverters for voltage regulation is used to demonstrate the effectiveness of the proposed decomposition algorithm and its components. The proposed decomposition method is demonstrated on the IEEE 118-bus system.« less
Two-dimensional triangular lattice and its application to lithium-intercalated layered compounds
NASA Astrophysics Data System (ADS)
Decerqueira, R. O.
1982-08-01
Good rechargeable batteries are being searched for use in electric vehicles and in energy storage during off-peak consumption periods and from solar sources. The interest in lithium intercalation compounds has been recently enhanced by the search for such batteries. The process of intercalation of lithium in several transition metal dichalcogenides can provide an emf of several volts. The progress achieved in the last decade in the investigation of these intercalates has been facilitated by the availability of the dichalcogenides as single crystals and by their chemical stability. The transition-metal dichalcogenides and their Li-intercalates are studied, with emphasis on the Li/su xTa/sub yTi/sub l-y/S2 series. The interactions between the Li atoms and the applicability of a lattice gas model to the problem of ordering of these atoms is discussed. A formulation is presented of the cluster-variation aproximation to the lattice gas problem. The single-site and the nearest-neighbor triangle basic clusters are considered as models for Li/sub x TiS2. Also a theory is presented for the effects of a random distribution of different species of host atoms, as in Ta/sub y/Ti/sub l-y/S2.
Zhang, Zhaoyang; Fang, Hua; Wang, Honggang
2016-06-01
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.
Zhang, Zhaoyang; Wang, Honggang
2016-01-01
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063
NASA Astrophysics Data System (ADS)
Zhao, Ya-Ru; Zhang, Hai-Rong; Qian, Yu; Duan, Xu-Chao; Hu, Yan-Fei
2016-03-01
Density functional theory has been applied to study the geometric structures, relative stabilities, and electronic properties of cationic [AunRb]+ and Aun + 1+ (n = 1-10) clusters. For the lowest energy structures of [AunRb]+ clusters, the planar to three-dimensional transformation is found to occur at cluster size n = 4 and the Rb atoms prefer being located at the most highly coordinated position. The trends of the averaged atomic binding energies, fragmentation energies, second-order difference of energies, and energy gaps show pronounced even-odd alternations. It indicated that the clusters containing odd number of atoms maintain greater stability than the clusters in the vicinity. In particular, the [Au6Rb]+ clusters are the most stable isomer for [AunRb]+ clusters in the region of n = 1-10. The charges in [AunRb]+ clusters transfer from the Rb atoms to Aun host. Density of states revealed that the Au-5d, Au-5p, and Rb-4p orbitals hardly participated in bonding. In addition, it is found that the most favourable channel of the [AunRb]+ clusters is Rb+ cation ejection. The electronic localisation function (ELF) analysis of the [AunRb]+ clusters shown that strong interactions are not revealed in this study.
NASA Astrophysics Data System (ADS)
Farsadnia, Farhad; Ghahreman, Bijan
2016-04-01
Hydrologic homogeneous group identification is considered both fundamental and applied research in hydrology. Clustering methods are among conventional methods to assess the hydrological homogeneous regions. Recently, Self-Organizing feature Map (SOM) method has been applied in some studies. However, the main problem of this method is the interpretation on the output map of this approach. Therefore, SOM is used as input to other clustering algorithms. The aim of this study is to apply a two-level Self-Organizing feature map and Ward hierarchical clustering method to determine the hydrologic homogenous regions in North and Razavi Khorasan provinces. At first by principal component analysis, we reduced SOM input matrix dimension, then the SOM was used to form a two-dimensional features map. To determine homogeneous regions for flood frequency analysis, SOM output nodes were used as input into the Ward method. Generally, the regions identified by the clustering algorithms are not statistically homogeneous. Consequently, they have to be adjusted to improve their homogeneity. After adjustment of the homogeneity regions by L-moment tests, five hydrologic homogeneous regions were identified. Finally, adjusted regions were created by a two-level SOM and then the best regional distribution function and associated parameters were selected by the L-moment approach. The results showed that the combination of self-organizing maps and Ward hierarchical clustering by principal components as input is more effective than the hierarchical method, by principal components or standardized inputs to achieve hydrologic homogeneous regions.
Evaluation of a Multicore-Optimized Implementation for Tomographic Reconstruction
Agulleiro, Jose-Ignacio; Fernández, José Jesús
2012-01-01
Tomography allows elucidation of the three-dimensional structure of an object from a set of projection images. In life sciences, electron microscope tomography is providing invaluable information about the cell structure at a resolution of a few nanometres. Here, large images are required to combine wide fields of view with high resolution requirements. The computational complexity of the algorithms along with the large image size then turns tomographic reconstruction into a computationally demanding problem. Traditionally, high-performance computing techniques have been applied to cope with such demands on supercomputers, distributed systems and computer clusters. In the last few years, the trend has turned towards graphics processing units (GPUs). Here we present a detailed description and a thorough evaluation of an alternative approach that relies on exploitation of the power available in modern multicore computers. The combination of single-core code optimization, vector processing, multithreading and efficient disk I/O operations succeeds in providing fast tomographic reconstructions on standard computers. The approach turns out to be competitive with the fastest GPU-based solutions thus far. PMID:23139768
Free boundary problems in shock reflection/diffraction and related transonic flow problems
Chen, Gui-Qiang; Feldman, Mikhail
2015-01-01
Shock waves are steep wavefronts that are fundamental in nature, especially in high-speed fluid flows. When a shock hits an obstacle, or a flying body meets a shock, shock reflection/diffraction phenomena occur. In this paper, we show how several long-standing shock reflection/diffraction problems can be formulated as free boundary problems, discuss some recent progress in developing mathematical ideas, approaches and techniques for solving these problems, and present some further open problems in this direction. In particular, these shock problems include von Neumann's problem for shock reflection–diffraction by two-dimensional wedges with concave corner, Lighthill's problem for shock diffraction by two-dimensional wedges with convex corner, and Prandtl-Meyer's problem for supersonic flow impinging onto solid wedges, which are also fundamental in the mathematical theory of multidimensional conservation laws. PMID:26261363
Oluwadare, Oluwatosin; Cheng, Jianlin
2017-11-14
With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .
Implementing Enrichment Clusters in Elementary Schools: Lessons Learned
ERIC Educational Resources Information Center
Fiddyment, Gail E.
2014-01-01
Enrichment clusters offer a way for schools to encourage a high level of learning as students and adults work together to develop a product, service, or performance by applying advanced knowledge and authentic processes to real-world problems. This study utilized a qualitative research design to examine the perceptions and experiences of two…
Interaction Networks: Generating High Level Hints Based on Network Community Clustering
ERIC Educational Resources Information Center
Eagle, Michael; Johnson, Matthew; Barnes, Tiffany
2012-01-01
We introduce a novel data structure, the Interaction Network, for representing interaction-data from open problem solving environment tutors. We show how using network community detecting techniques are used to identify sub-goals in problems in a logic tutor. We then use those community structures to generate high level hints between sub-goals.…
A cubic spline approximation for problems in fluid mechanics
NASA Technical Reports Server (NTRS)
Rubin, S. G.; Graves, R. A., Jr.
1975-01-01
A cubic spline approximation is presented which is suited for many fluid-mechanics problems. This procedure provides a high degree of accuracy, even with a nonuniform mesh, and leads to an accurate treatment of derivative boundary conditions. The truncation errors and stability limitations of several implicit and explicit integration schemes are presented. For two-dimensional flows, a spline-alternating-direction-implicit method is evaluated. The spline procedure is assessed, and results are presented for the one-dimensional nonlinear Burgers' equation, as well as the two-dimensional diffusion equation and the vorticity-stream function system describing the viscous flow in a driven cavity. Comparisons are made with analytic solutions for the first two problems and with finite-difference calculations for the cavity flow.
The Effect of Mergers on Galaxy Cluster Mass Estimates
NASA Astrophysics Data System (ADS)
Johnson, Ryan E.; Zuhone, John A.; Thorsen, Tessa; Hinds, Andre
2015-08-01
At vertices within the filamentary structure that describes the universal matter distribution, clusters of galaxies grow hierarchically through merging with other clusters. As such, the most massive galaxy clusters should have experienced many such mergers in their histories. Though we cannot see them evolve over time, these mergers leave lasting, measurable effects in the cluster galaxies' phase space. By simulating several different galaxy cluster mergers here, we examine how the cluster galaxies kinematics are altered as a result of these mergers. Further, we also examine the effect of our line of sight viewing angle with respect to the merger axis. In projecting the 6-dimensional galaxy phase space onto a 3-dimensional plane, we are able to simulate how these clusters might actually appear to optical redshift surveys. We find that for those optical cluster statistics which are most often used as a proxy for the cluster mass (variants of σv), the uncertainty due to an inprecise or unknown line of sight may alter the derived cluster masses moreso than the kinematic disturbance of the merger itself. Finally, by examining these, and several other clustering statistics, we find that significant events (such as pericentric crossings) are identifiable over a range of merger initial conditions and from many different lines of sight.
Peng, Bo; Kowalski, Karol
2017-01-25
In this paper, we apply reverse Cuthill-McKee (RCM) algorithm to transform two-electron integral tensors to their block diagonal forms. By further applying Cholesky decomposition (CD) on each of the diagonal blocks, we are able to represent the high-dimensional two-electron integral tensors in terms of permutation matrices and low-rank Cholesky vectors. This representation facilitates low-rank factorizations of high-dimensional tensor contractions in post-Hartree-Fock calculations. Finally, we discuss the second-order Møller-Plesset (MP2) method and the linear coupled-cluster model with doubles (L-CCD) as examples to demonstrate the efficiency of this technique in representing the two-electron integrals in a compact form.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peng, Bo; Kowalski, Karol
In this paper, we apply reverse Cuthill-McKee (RCM) algorithm to transform two-electron integral tensors to their block diagonal forms. By further applying Cholesky decomposition (CD) on each of the diagonal blocks, we are able to represent the high-dimensional two-electron integral tensors in terms of permutation matrices and low-rank Cholesky vectors. This representation facilitates low-rank factorizations of high-dimensional tensor contractions in post-Hartree-Fock calculations. Finally, we discuss the second-order Møller-Plesset (MP2) method and the linear coupled-cluster model with doubles (L-CCD) as examples to demonstrate the efficiency of this technique in representing the two-electron integrals in a compact form.
Tracing Large Scale Structure with a Redshift Survey of Rich Clusters of Galaxies
NASA Astrophysics Data System (ADS)
Batuski, D.; Slinglend, K.; Haase, S.; Hill, J. M.
1993-12-01
Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and hold promise of confirming the existence of structure in the more immediate universe on scales corresponding to COBE results (i.e., on the order of 10% or more of the horizon size of the universe). However, most Abell clusters do not as yet have measured redshifts (or, in the case of most low redshift clusters, have only one or two galaxies measured), so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters, perhaps even to the point of spurious identifications of some of the clusters themselves. Our approach in this effort has been to use the MX multifiber spectrometer to measure redshifts of at least ten galaxies in each of about 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8. This work will result in a somewhat deeper, much more complete (and reliable) sample of positions of rich clusters. Our primary use for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 40 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.
Methods of Conceptual Clustering and their Relation to Numerical Taxonomy.
1985-07-22
the conceptual clustering problem is to first solve theaggregation problem, and then the characterization problem. In machine learning, the...cluster- ings by first generating some number of possible clusterings. For each clustering generated, one calls a learning from examples subroutine, which...class 1 from class 2, and vice versa, only the first combination implies a partition over the set of theoretically possible objects. The first
Adham, Manal T; Bentley, Peter J
2016-08-01
This paper proposes and evaluates a solution to the truck redistribution problem prominent in London's Santander Cycle scheme. Due to the complexity of this NP-hard combinatorial optimisation problem, no efficient optimisation techniques are known to solve the problem exactly. This motivates our use of the heuristic Artificial Ecosystem Algorithm (AEA) to find good solutions in a reasonable amount of time. The AEA is designed to take advantage of highly distributed computer architectures and adapt to changing problems. In the AEA a problem is first decomposed into its relative sub-components; they then evolve solution building blocks that fit together to form a single optimal solution. Three variants of the AEA centred on evaluating clustering methods are presented: the baseline AEA, the community-based AEA which groups stations according to journey flows, and the Adaptive AEA which actively modifies clusters to cater for changes in demand. We applied these AEA variants to the redistribution problem prominent in bike share schemes (BSS). The AEA variants are empirically evaluated using historical data from Santander Cycles to validate the proposed approach and prove its potential effectiveness. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Ridenour, Ty A; Reynolds, Maureen; Ahlqvist, Ola; Zhai, Zu Wei; Kirisci, Levent; Vanyukov, Michael M; Tarter, Ralph E
2013-05-01
Knowledge of where substance use and other such behavioral problems frequently occur has aided policing, public health, and urban planning strategies to reduce such behaviors. Identifying locales characterized by high childhood neurobehavioral disinhibition (ND), a strong predictor of substance use and consequent disorder (SUD), may likewise improve prevention efforts. The distribution of ND in 10-12-year olds was mapped to metropolitan Pittsburgh, PA, and tested for clustering within locales. The 738 participating families represented the population in terms of economic status, race, and population distribution. ND was measured using indicators of executive cognitive function, emotion regulation, and behavior control. Innovative geospatial analyzes statistically tested clustering of ND within locales while accounting for geographic barriers (large rivers, major highways), parental SUD severity, and neighborhood quality. Clustering of youth with high and low ND occurred in specific locales. Accounting for geographic barriers better delineated where high ND is concentrated, areas which also tended to be characterized by greater parental SUD severity and poorer neighborhood quality. Offering programs that have been demonstrated to improve inhibitory control in locales where youth have high ND on average may reduce youth risk for SUD and other problem behaviors. As demonstrated by the present results, geospatial analysis of youth risk factors, frequently used in community coalition strategies, may be improved with greater statistical and measurement rigor.
Decomposition and model selection for large contingency tables.
Dahinden, Corinne; Kalisch, Markus; Bühlmann, Peter
2010-04-01
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.
Data-Driven Packet Loss Estimation for Node Healthy Sensing in Decentralized Cluster.
Fan, Hangyu; Wang, Huandong; Li, Yong
2018-01-23
Decentralized clustering of modern information technology is widely adopted in various fields these years. One of the main reason is the features of high availability and the failure-tolerance which can prevent the entire system form broking down by a failure of a single point. Recently, toolkits such as Akka are used by the public commonly to easily build such kind of cluster. However, clusters of such kind that use Gossip as their membership managing protocol and use link failure detecting mechanism to detect link failures cannot deal with the scenario that a node stochastically drops packets and corrupts the member status of the cluster. In this paper, we formulate the problem to be evaluating the link quality and finding a max clique (NP-Complete) in the connectivity graph. We then proposed an algorithm that consists of two models driven by data from application layer to respectively solving these two problems. Through simulations with statistical data and a real-world product, we demonstrate that our algorithm has a good performance.
Potashev, Konstantin; Sharonova, Natalia; Breus, Irina
2014-07-01
Clustering was employed for the analysis of obtained experimental data set (42 plants in total) on seed germination in leached chernozem contaminated with kerosene. Among investigated plants were 31 cultivated plants from 11 families (27 species and 20 varieties) and 11 wild plant species from 7 families, 23 annual and 19 perennial/biannual plant species, 11 monocotyledonous and 31 dicotyledonous plants. Two-dimensional (two-parameter) clustering approach, allowing the estimation of tolerance of germinating seeds using a pair of independent parameters (С75%, V7%) was found to be most effective. These parameters characterized the ability of seeds to both withstand high concentrations of contaminants without the significant reduction of the germination, and maintain high germination rate within certain contaminant concentrations. The performed clustering revealed a number of plant features, which define the relation of a particular plant to a particular tolerance cluster; it has also demonstrated the possibility of generalizing the kerosene results for n-tridecane, which is one of the typical kerosene components. In contrast to the "manual" plant ranking based on the assessment of germination at discrete concentrations of the contaminant, the proposed clustering approach allowed a generalized characterization of the seed tolerance/sensitivity to hydrocarbon contaminants. Copyright © 2014 Elsevier B.V. All rights reserved.
Locating landmarks on high-dimensional free energy surfaces
Chen, Ming; Yu, Tang-Qing; Tuckerman, Mark E.
2015-01-01
Coarse graining of complex systems possessing many degrees of freedom can often be a useful approach for analyzing and understanding key features of these systems in terms of just a few variables. The relevant energy landscape in a coarse-grained description is the free energy surface as a function of the coarse-grained variables, which, despite the dimensional reduction, can still be an object of high dimension. Consequently, navigating and exploring this high-dimensional free energy surface is a nontrivial task. In this paper, we use techniques from multiscale modeling, stochastic optimization, and machine learning to devise a strategy for locating minima and saddle points (termed “landmarks”) on a high-dimensional free energy surface “on the fly” and without requiring prior knowledge of or an explicit form for the surface. In addition, we propose a compact graph representation of the landmarks and connections between them, and we show that the graph nodes can be subsequently analyzed and clustered based on key attributes that elucidate important properties of the system. Finally, we show that knowledge of landmark locations allows for the efficient determination of their relative free energies via enhanced sampling techniques. PMID:25737545
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Motkova, A. V.
2018-01-01
A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.
More than one way to be happy: a typology of marital happiness.
Rauer, Amy; Volling, Brenda
2013-09-01
This study utilized observational and self-report data from 57 happily married couples to explore assumptions regarding marital happiness. Suggesting that happily married couples are not a homogeneous group, cluster analyses revealed the existence of three types of couples based on their observed behaviors in a problem-solving task: (1) mutually engaged couples (characterized by both spouses' higher negative and positive problem-solving); (2) mutually supportive couples (characterized by both spouses' higher positivity and support); and (3) wife compensation couples (characterized by high wife positivity). Although couples in all three clusters were equally happy with and committed to their marriages, these clusters were differentially associated with spouses' evaluations of their marriage. Spouses in the mutually supportive cluster reported greater intimacy and maintenance and less conflict and ambivalence, although this was more consistently the case in comparison to the wife compensation cluster, as opposed to the mutually engaged cluster. The implications of these typologies are discussed as they pertain to efforts on the part of both practitioners to promote marital happiness and repair marital relations when couples are faced with difficulties. © FPI, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Leiming; Huang, Wei; Wang, Lai S.
The structure and electronic properties of the Al 8N - and Al 8N clusters were investigated by combined photoelectron spectroscopy and ab initio studies. Congested photoelectron spectra were observed and experimental evidence was obtained for the presence of multiple isomers for Al 8N - Global minimum searches revealed several structures for Al 8N - with close energies. The calculated vertical detachment energies of the two lowest-lying isomers, which are of C 2v and C s symmetry, respectively, were shown to agree well with the experimental data. Unlike the three-dimensional structures of Al 6N - and Al 7N -, in whichmore » the dopant N atom has a high coordination number of 6,the dopant N atom in the two low-lying isomers of Al 8N - has a lower coordination number of 4 and 5, respectively. The competition between the Al–Al and Al–N interactions are shown to determine the global minimum structures of the doped aluminum clusters and results in the structural diversity for both Al 8N - and Al8N. © 2009 American Institute of Physics« less
Noise-free accurate count of microbial colonies by time-lapse shadow image analysis.
Ogawa, Hiroyuki; Nasu, Senshi; Takeshige, Motomu; Funabashi, Hisakage; Saito, Mikako; Matsuoka, Hideaki
2012-12-01
Microbial colonies in food matrices could be counted accurately by a novel noise-free method based on time-lapse shadow image analysis. An agar plate containing many clusters of microbial colonies and/or meat fragments was trans-illuminated to project their 2-dimensional (2D) shadow images on a color CCD camera. The 2D shadow images of every cluster distributed within a 3-mm thick agar layer were captured in focus simultaneously by means of a multiple focusing system, and were then converted to 3-dimensional (3D) shadow images. By time-lapse analysis of the 3D shadow images, it was determined whether each cluster comprised single or multiple colonies or a meat fragment. The analytical precision was high enough to be able to distinguish a microbial colony from a meat fragment, to recognize an oval image as two colonies contacting each other, and to detect microbial colonies hidden under a food fragment. The detection of hidden colonies is its outstanding performance in comparison with other systems. The present system attained accuracy for counting fewer than 5 colonies and is therefore of practical importance. Copyright © 2012 Elsevier B.V. All rights reserved.
Geographical Clusters of Rape in the United States: 2000-2012
Amin, Raid; Nabors, Nicole S.; Nelson, Arlene M.; Saqlain, Murshid; Kulldorff, Martin
2016-01-01
Background While rape is a very serious crime and public health problem, no spatial mapping has been attempted for rape on the national scale. This paper addresses the three research questions: (1) Are reported rape cases randomly distributed across the USA, after being adjusted for population density and age, or are there geographical clusters of reported rape cases? (2) Are the geographical clusters of reported rapes still present after adjusting for differences in poverty levels? (3) Are there geographical clusters where the proportion of reported rape cases that lead to an arrest is exceptionally low or exceptionally high? Methods We studied the geographical variation of reported rape events (2003-2012) and rape arrests (2000-2012) in the 48 contiguous states of the USA. The disease Surveillance software SaTScan™ with its spatial scan statistic is used to evaluate the spatial variation in rapes. The spatial scan statistic has been widely used as a geographical surveillance tool for diseases, and we used it to identify geographical areas with clusters of reported rape and clusters of arrest rates for rape. Results The spatial scan statistic was used to identify geographical areas with exceptionally high rates of reported rape. The analyses were adjusted for age, and in secondary analyses, for both age and poverty level. We also identified geographical areas with either a low or a high proportion of reported rapes leading to an arrest. Conclusions We have identified geographical areas with exceptionally high (low) rates of reported rape. The geographical problem areas identified are prime candidates for more intensive preventive counseling and criminal prosecution efforts by public health, social service, and law enforcement agencies Geographical clusters of high rates of reported rape are prime areas in need of expanded implementation of preventive measures, such as changing attitudes in our society toward rape crimes, in addition to having the criminal justice system play an even larger role in preventing rape. PMID:28078318
Geographical Clusters of Rape in the United States: 2000-2012.
Amin, Raid; Nabors, Nicole S; Nelson, Arlene M; Saqlain, Murshid; Kulldorff, Martin
2015-01-01
While rape is a very serious crime and public health problem, no spatial mapping has been attempted for rape on the national scale. This paper addresses the three research questions: (1) Are reported rape cases randomly distributed across the USA, after being adjusted for population density and age, or are there geographical clusters of reported rape cases? (2) Are the geographical clusters of reported rapes still present after adjusting for differences in poverty levels? (3) Are there geographical clusters where the proportion of reported rape cases that lead to an arrest is exceptionally low or exceptionally high? We studied the geographical variation of reported rape events (2003-2012) and rape arrests (2000-2012) in the 48 contiguous states of the USA. The disease Surveillance software SaTScan™ with its spatial scan statistic is used to evaluate the spatial variation in rapes. The spatial scan statistic has been widely used as a geographical surveillance tool for diseases, and we used it to identify geographical areas with clusters of reported rape and clusters of arrest rates for rape. The spatial scan statistic was used to identify geographical areas with exceptionally high rates of reported rape. The analyses were adjusted for age, and in secondary analyses, for both age and poverty level. We also identified geographical areas with either a low or a high proportion of reported rapes leading to an arrest. We have identified geographical areas with exceptionally high (low) rates of reported rape. The geographical problem areas identified are prime candidates for more intensive preventive counseling and criminal prosecution efforts by public health, social service, and law enforcement agencies Geographical clusters of high rates of reported rape are prime areas in need of expanded implementation of preventive measures, such as changing attitudes in our society toward rape crimes, in addition to having the criminal justice system play an even larger role in preventing rape.
Engineering two-photon high-dimensional states through quantum interference
Zhang, Yingwen; Roux, Filippus S.; Konrad, Thomas; Agnew, Megan; Leach, Jonathan; Forbes, Andrew
2016-01-01
Many protocols in quantum science, for example, linear optical quantum computing, require access to large-scale entangled quantum states. Such systems can be realized through many-particle qubits, but this approach often suffers from scalability problems. An alternative strategy is to consider a lesser number of particles that exist in high-dimensional states. The spatial modes of light are one such candidate that provides access to high-dimensional quantum states, and thus they increase the storage and processing potential of quantum information systems. We demonstrate the controlled engineering of two-photon high-dimensional states entangled in their orbital angular momentum through Hong-Ou-Mandel interference. We prepare a large range of high-dimensional entangled states and implement precise quantum state filtering. We characterize the full quantum state before and after the filter, and are thus able to determine that only the antisymmetric component of the initial state remains. This work paves the way for high-dimensional processing and communication of multiphoton quantum states, for example, in teleportation beyond qubits. PMID:26933685
Copula based flexible modeling of associations between clustered event times.
Geerdens, Candida; Claeskens, Gerda; Janssen, Paul
2016-07-01
Multivariate survival data are characterized by the presence of correlation between event times within the same cluster. First, we build multi-dimensional copulas with flexible and possibly symmetric dependence structures for such data. In particular, clustered right-censored survival data are modeled using mixtures of max-infinitely divisible bivariate copulas. Second, these copulas are fit by a likelihood approach where the vast amount of copula derivatives present in the likelihood is approximated by finite differences. Third, we formulate conditions for clustered right-censored survival data under which an information criterion for model selection is either weakly consistent or consistent. Several of the familiar selection criteria are included. A set of four-dimensional data on time-to-mastitis is used to demonstrate the developed methodology.
Towards an Autonomic Cluster Management System (ACMS) with Reflex Autonomicity
NASA Technical Reports Server (NTRS)
Truszkowski, Walt; Hinchey, Mike; Sterritt, Roy
2005-01-01
Cluster computing, whereby a large number of simple processors or nodes are combined together to apparently function as a single powerful computer, has emerged as a research area in its own right. The approach offers a relatively inexpensive means of providing a fault-tolerant environment and achieving significant computational capabilities for high-performance computing applications. However, the task of manually managing and configuring a cluster quickly becomes daunting as the cluster grows in size. Autonomic computing, with its vision to provide self-management, can potentially solve many of the problems inherent in cluster management. We describe the development of a prototype Autonomic Cluster Management System (ACMS) that exploits autonomic properties in automating cluster management and its evolution to include reflex reactions via pulse monitoring.
Identify High-Quality Protein Structural Models by Enhanced K-Means.
Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.
Identify High-Quality Protein Structural Models by Enhanced K-Means
Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198
NASA Astrophysics Data System (ADS)
Pasquato, Mario; Chung, Chul
2016-05-01
Context. Machine-learning (ML) solves problems by learning patterns from data with limited or no human guidance. In astronomy, ML is mainly applied to large observational datasets, e.g. for morphological galaxy classification. Aims: We apply ML to gravitational N-body simulations of star clusters that are either formed by merging two progenitors or evolved in isolation, planning to later identify globular clusters (GCs) that may have a history of merging from observational data. Methods: We create mock-observations from simulated GCs, from which we measure a set of parameters (also called features in the machine-learning field). After carrying out dimensionality reduction on the feature space, the resulting datapoints are fed in to various classification algorithms. Using repeated random subsampling validation, we check whether the groups identified by the algorithms correspond to the underlying physical distinction between mergers and monolithically evolved simulations. Results: The three algorithms we considered (C5.0 trees, k-nearest neighbour, and support-vector machines) all achieve a test misclassification rate of about 10% without parameter tuning, with support-vector machines slightly outperforming the others. The first principal component of feature space correlates with cluster concentration. If we exclude it from the regression, the performance of the algorithms is only slightly reduced.
NASA Technical Reports Server (NTRS)
Kumar, A.
1984-01-01
A computer program NASCRIN has been developed for analyzing two-dimensional flow fields in high-speed inlets. It solves the two-dimensional Euler or Navier-Stokes equations in conservation form by an explicit, two-step finite-difference method. An explicit-implicit method can also be used at the user's discretion for viscous flow calculations. For turbulent flow, an algebraic, two-layer eddy-viscosity model is used. The code is operational on the CDC CYBER 203 computer system and is highly vectorized to take full advantage of the vector-processing capability of the system. It is highly user oriented and is structured in such a way that for most supersonic flow problems, the user has to make only a few changes. Although the code is primarily written for supersonic internal flow, it can be used with suitable changes in the boundary conditions for a variety of other problems.
Using Betweenness Centrality to Identify Manifold Shortcuts
Cukierski, William J.; Foran, David J.
2010-01-01
High-dimensional data presents a challenge to tasks of pattern recognition and machine learning. Dimensionality reduction (DR) methods remove the unwanted variance and make these tasks tractable. Several nonlinear DR methods, such as the well known ISOMAP algorithm, rely on a neighborhood graph to compute geodesic distances between data points. These graphs can contain unwanted edges which connect disparate regions of one or more manifolds. This topological sensitivity is well known [1], [2], [3], yet handling high-dimensional, noisy data in the absence of a priori manifold knowledge, remains an open and difficult problem. This work introduces a divisive, edge-removal method based on graph betweenness centrality which can robustly identify manifold-shorting edges. The problem of graph construction in high dimension is discussed and the proposed algorithm is fit into the ISOMAP workflow. ROC analysis is performed and the performance is tested on synthetic and real datasets. PMID:20607142
Aerodynamics of an airfoil with a jet issuing from its surface
NASA Technical Reports Server (NTRS)
Tavella, D. A.; Karamcheti, K.
1982-01-01
A simple, two dimensional, incompressible and inviscid model for the problem posed by a two dimensional wing with a jet issuing from its lower surface is considered and a parametric analysis is carried out to observe how the aerodynamic characteristics depend on the different parameters. The mathematical problem constitutes a boundary value problem where the position of part of the boundary is not known a priori. A nonlinear optimization approach was used to solve the problem, and the analysis reveals interesting characteristics that may help to better understand the physics involved in more complex situations in connection with high lift systems.
Comments on "The multisynapse neural network and its application to fuzzy clustering".
Yu, Jian; Hao, Pengwei
2005-05-01
In the above-mentioned paper, Wei and Fahn proposed a neural architecture, the multisynapse neural network, to solve constrained optimization problems including high-order, logarithmic, and sinusoidal forms, etc. As one of its main applications, a fuzzy bidirectional associative clustering network (FBACN) was proposed for fuzzy-partition clustering according to the objective-functional method. The connection between the objective-functional-based fuzzy c-partition algorithms and FBACN is the Lagrange multiplier approach. Unfortunately, the Lagrange multiplier approach was incorrectly applied so that FBACN does not equivalently minimize its corresponding constrained objective-function. Additionally, Wei and Fahn adopted traditional definition of fuzzy c-partition, which is not satisfied by FBACN. Therefore, FBACN can not solve constrained optimization problems, either.
Theory of the vortex-clustering transition in a confined two-dimensional quantum fluid
NASA Astrophysics Data System (ADS)
Yu, Xiaoquan; Billam, Thomas P.; Nian, Jun; Reeves, Matthew T.; Bradley, Ashton S.
2016-08-01
Clustering of like-sign vortices in a planar bounded domain is known to occur at negative temperature, a phenomenon that Onsager demonstrated to be a consequence of bounded phase space. In a confined superfluid, quantized vortices can support such an ordered phase, provided they evolve as an almost isolated subsystem containing sufficient energy. A detailed theoretical understanding of the statistical mechanics of such states thus requires a microcanonical approach. Here we develop an analytical theory of the vortex clustering transition in a neutral system of quantum vortices confined to a two-dimensional disk geometry, within the microcanonical ensemble. The choice of ensemble is essential for identifying the correct thermodynamic limit of the system, enabling a rigorous description of clustering in the language of critical phenomena. As the system energy increases above a critical value, the system develops global order via the emergence of a macroscopic dipole structure from the homogeneous phase of vortices, spontaneously breaking the Z2 symmetry associated with invariance under vortex circulation exchange, and the rotational SO (2 ) symmetry due to the disk geometry. The dipole structure emerges characterized by the continuous growth of the macroscopic dipole moment which serves as a global order parameter, resembling a continuous phase transition. The critical temperature of the transition, and the critical exponent associated with the dipole moment, are obtained exactly within mean-field theory. The clustering transition is shown to be distinct from the final state reached at high energy, known as supercondensation. The dipole moment develops via two macroscopic vortex clusters and the cluster locations are found analytically, both near the clustering transition and in the supercondensation limit. The microcanonical theory shows excellent agreement with Monte Carlo simulations, and signatures of the transition are apparent even for a modest system of 100 vortices, accessible in current Bose-Einstein condensate experiments.
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising
NASA Astrophysics Data System (ADS)
Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua
2018-04-01
In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
Clustering on Magnesium Surfaces - Formation and Diffusion Energies.
Chu, Haijian; Huang, Hanchen; Wang, Jian
2017-07-12
The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and [Formula: see text]. In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a [Formula: see text], clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface [Formula: see text] is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich-Schwoebel barriers converge as the step height is three atomic layers or thicker. Adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.
About Distributed Simulation-based Optimization of Forming Processes using a Grid Architecture
NASA Astrophysics Data System (ADS)
Grauer, Manfred; Barth, Thomas
2004-06-01
Permanently increasing complexity of products and their manufacturing processes combined with a shorter "time-to-market" leads to more and more use of simulation and optimization software systems for product design. Finding a "good" design of a product implies the solution of computationally expensive optimization problems based on the results of simulation. Due to the computational load caused by the solution of these problems, the requirements on the Information&Telecommunication (IT) infrastructure of an enterprise or research facility are shifting from stand-alone resources towards the integration of software and hardware resources in a distributed environment for high-performance computing. Resources can either comprise software systems, hardware systems, or communication networks. An appropriate IT-infrastructure must provide the means to integrate all these resources and enable their use even across a network to cope with requirements from geographically distributed scenarios, e.g. in computational engineering and/or collaborative engineering. Integrating expert's knowledge into the optimization process is inevitable in order to reduce the complexity caused by the number of design variables and the high dimensionality of the design space. Hence, utilization of knowledge-based systems must be supported by providing data management facilities as a basis for knowledge extraction from product data. In this paper, the focus is put on a distributed problem solving environment (PSE) capable of providing access to a variety of necessary resources and services. A distributed approach integrating simulation and optimization on a network of workstations and cluster systems is presented. For geometry generation the CAD-system CATIA is used which is coupled with the FEM-simulation system INDEED for simulation of sheet-metal forming processes and the problem solving environment OpTiX for distributed optimization.
Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail; Volpi, Michele; Copa, Loris
2010-05-01
The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of MNO problem: 1) hierarchical top-down clustering in an input space in order to remove redundancy when data are clustered, and 2) a general method (independent on classifier) which gives posterior probabilities that can be used to define the classifier confidence and corresponding proposals for new measurement points. The basic ideas and procedures are explained by applying simulated data sets. The real case study deals with the analysis and mapping of soil types, which is a multi-class classification problem. Maps of soil types are important for the analysis and 3D modeling of heavy metals migration in soil and prediction risk mapping. The results obtained demonstrate the high quality of SVM mapping and efficiency of monitoring network optimization by using active learning approaches. The research was partly supported by SNSF projects No. 200021-126505 and 200020-121835.
Atlas-Guided Cluster Analysis of Large Tractography Datasets
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
NASA Astrophysics Data System (ADS)
Hu, Yan-Fei; Jiang, Gang; Meng, Da-Qiao
2012-01-01
The density functional method with the relativistic effective core potential has been employed to investigate systematically the geometric structures, relative stabilities, growth-pattern behavior, and electronic properties of small bimetallic Au n Rb (n = 1-10) and pure gold Au n (n ≤ 11) clusters. For the geometric structures of the Au n Rb (n = 1-10) clusters, the dominant growth pattern is for a Rb-substituted Au n +1 cluster or one Au atom capped on a Au n -1Rb cluster, and the turnover point from a two-dimensional to a three-dimensional structure occurs at n = 4. Moreover, the stability of the ground-state structures of these clusters has been examined via an analysis of the average atomic binding energies, fragmentation energies, and the second-order difference of energies as a function of cluster size. The results exhibit a pronounced even-odd alternation phenomenon. The same pronounced even-odd alternations are found for the HOMO-LUMO gap, VIPs, VEAs, and the chemical hardness. In addition, about one electron charge transfers from the Au n host to the Rb atom in each corresponding Au n Rb cluster.
Working research codes into fluid dynamics education: a science gateway approach
NASA Astrophysics Data System (ADS)
Mason, Lachlan; Hetherington, James; O'Reilly, Martin; Yong, May; Jersakova, Radka; Grieve, Stuart; Perez-Suarez, David; Klapaukh, Roman; Craster, Richard V.; Matar, Omar K.
2017-11-01
Research codes are effective for illustrating complex concepts in educational fluid dynamics courses, compared to textbook examples, an interactive three-dimensional visualisation can bring a problem to life! Various barriers, however, prevent the adoption of research codes in teaching: codes are typically created for highly-specific `once-off' calculations and, as such, have no user interface and a steep learning curve. Moreover, a code may require access to high-performance computing resources that are not readily available in the classroom. This project allows academics to rapidly work research codes into their teaching via a minimalist `science gateway' framework. The gateway is a simple, yet flexible, web interface allowing students to construct and run simulations, as well as view and share their output. Behind the scenes, the common operations of job configuration, submission, monitoring and post-processing are customisable at the level of shell scripting. In this talk, we demonstrate the creation of an example teaching gateway connected to the Code BLUE fluid dynamics software. Student simulations can be run via a third-party cloud computing provider or a local high-performance cluster. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
Clustering and variable selection in the presence of mixed variable types and missing data.
Storlie, C B; Myers, S M; Katusic, S K; Weaver, A L; Voigt, R G; Croarkin, P E; Stoeckel, R E; Port, J D
2018-05-17
We consider the problem of model-based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines. Copyright © 2018 John Wiley & Sons, Ltd.
Profiles of behavioral problems in children who witness domestic violence.
Spilsbury, James C; Kahana, Shoshana; Drotar, Dennis; Creeden, Rosemary; Flannery, Daniel J; Friedman, Steve
2008-01-01
Unlike previous investigations of shelter-based samples, our study examined whether profiles of adjustment problems occurred in a community-program-based sample of 175 school-aged children exposed to domestic violence. Cluster analysis revealed three stable profiles/clusters. The largest cluster (69%) consisted of children below clinical thresholds for any internalizing or externalizing problem. Children in the next largest cluster (18%) were characterized as having externalizing problems with or without internalizing problems. The smallest cluster (13%) consisted of children with internalizing problems only. Comparison across demographic and violence characteristics revealed that the profiles differed by child gender, mother's education, child's lifetime exposure to violence, and aspects of the event precipitating contact with the community program. Clinical and future research implications of study findings are discussed.
The Relationship Between Problem Gambling and Attention Deficit Hyperactivity Disorder.
Waluk, O R; Youssef, G J; Dowling, N A
2016-06-01
Recent studies indicate that treatment-seeking problem gamblers display elevated rates of ADHD and that adolescents who screen positive for ADHD are more likely to engage in gambling, develop gambling problems, and experience a greater severity in gambling problems. This study aimed to (a) compare the prevalence of ADHD in treatment-seeking problem gamblers to the general population; (b) investigate the relationships between ADHD and problem gambling severity, cluster B personality disorders, motor impulsivity, alcohol use, substance use, gender, and age; and (c) investigate the degree to which these factors moderate the relationship between ADHD and problem gambling severity. Participants included 214 adults (154 males, 58 females, 2 unspecified) who sought treatment for their gambling problems at a specialist gambling agency in Melbourne, Australia. Almost one-quarter (24.9 %) of treatment-seeking problem gamblers screened positively for ADHD, which was significantly higher than the 14 % prevalence in a community sample. ADHD was significantly positively correlated with problem gambling severity, motor impulsivity, and cluster B personality disorders, but was not associated with alcohol and substance use, gender or age. None of the factors significantly moderated the relationship between ADHD and problem gambling severity. These findings suggest that a considerable proportion of treatment-seeking problem gamblers report ADHD and that their clinical profile is complicated by the presence of high impulsivity and cluster B personality disorders. They highlight the need for specialist gambling agencies to develop screening, assessment, and management protocols for co-occurring ADHD to enhance the effectiveness of treatment.
Chao, Ming; Wei, Jie; Narayanasamy, Ganesh; Yuan, Yading; Lo, Yeh-Chi; Peñagarícano, José A
2018-05-01
To investigate three-dimensional cluster structure and its correlation to clinical endpoint in heterogeneous dose distributions from intensity modulated radiation therapy. Twenty-five clinical plans from twenty-one head and neck (HN) patients were used for a phenomenological study of the cluster structure formed from the dose distributions of organs at risks (OARs) close to the planning target volumes (PTVs). Initially, OAR clusters were searched to examine the pattern consistence among ten HN patients and five clinically similar plans from another HN patient. Second, clusters of the esophagus from another ten HN patients were scrutinized to correlate their sizes to radiobiological parameters. Finally, an extensive Monte Carlo (MC) procedure was implemented to gain deeper insights into the behavioral properties of the cluster formation. Clinical studies showed that OAR clusters had drastic differences despite similar PTV coverage among different patients, and the radiobiological parameters failed to positively correlate with the cluster sizes. MC study demonstrated the inverse relationship between the cluster size and the cluster connectivity, and the nonlinear changes in cluster size with dose thresholds. In addition, the clusters were insensitive to the shape of OARs. The results demonstrated that the cluster size could serve as an insightful index of normal tissue damage. The clinical outcome of the same dose-volume might be potentially different. Copyright © 2018 Elsevier B.V. All rights reserved.
Crevillén-García, D
2018-04-01
Time-consuming numerical simulators for solving groundwater flow and dissolution models of physico-chemical processes in deep aquifers normally require some of the model inputs to be defined in high-dimensional spaces in order to return realistic results. Sometimes, the outputs of interest are spatial fields leading to high-dimensional output spaces. Although Gaussian process emulation has been satisfactorily used for computing faithful and inexpensive approximations of complex simulators, these have been mostly applied to problems defined in low-dimensional input spaces. In this paper, we propose a method for simultaneously reducing the dimensionality of very high-dimensional input and output spaces in Gaussian process emulators for stochastic partial differential equation models while retaining the qualitative features of the original models. This allows us to build a surrogate model for the prediction of spatial fields in such time-consuming simulators. We apply the methodology to a model of convection and dissolution processes occurring during carbon capture and storage.
A new Lagrangian method for three-dimensional steady supersonic flows
NASA Technical Reports Server (NTRS)
Loh, Ching-Yuen; Liou, Meng-Sing
1993-01-01
In this report, the new Lagrangian method introduced by Loh and Hui is extended for three-dimensional, steady supersonic flow computation. The derivation of the conservation form and the solution of the local Riemann solver using the Godunov and the high-resolution TVD (total variation diminished) scheme is presented. This new approach is accurate and robust, capable of handling complicated geometry and interactions between discontinuous waves. Test problems show that the extended Lagrangian method retains all the advantages of the two-dimensional method (e.g., crisp resolution of a slip-surface (contact discontinuity) and automatic grid generation). In this report, we also suggest a novel three dimensional Riemann problem in which interesting and intricate flow features are present.
NASA Astrophysics Data System (ADS)
Regis, Rommel G.
2014-02-01
This article develops two new algorithms for constrained expensive black-box optimization that use radial basis function surrogates for the objective and constraint functions. These algorithms are called COBRA and Extended ConstrLMSRBF and, unlike previous surrogate-based approaches, they can be used for high-dimensional problems where all initial points are infeasible. They both follow a two-phase approach where the first phase finds a feasible point while the second phase improves this feasible point. COBRA and Extended ConstrLMSRBF are compared with alternative methods on 20 test problems and on the MOPTA08 benchmark automotive problem (D.R. Jones, Presented at MOPTA 2008), which has 124 decision variables and 68 black-box inequality constraints. The alternatives include a sequential penalty derivative-free algorithm, a direct search method with kriging surrogates, and two multistart methods. Numerical results show that COBRA algorithms are competitive with Extended ConstrLMSRBF and they generally outperform the alternatives on the MOPTA08 problem and most of the test problems.
An Analysis of Rich Cluster Redshift Survey Data for Large Scale Structure Studies
NASA Astrophysics Data System (ADS)
Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.
1994-12-01
The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and may hold the promise of confirming structure on the scale of the COBE result. However, many Abell clusters have zero or only one measured redshift, so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work will result in a deeper, more complete (and reliable) sample of positions of rich clusters. Our primary intent for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters in an effort to constrain theoretical models for structure formation. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 64 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms and stripe density plots for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.
Altiparmak, Fatih; Ferhatosmanoglu, Hakan; Erdal, Selnur; Trost, Donald C
2006-04-01
An effective analysis of clinical trials data involves analyzing different types of data such as heterogeneous and high dimensional time series data. The current time series analysis methods generally assume that the series at hand have sufficient length to apply statistical techniques to them. Other ideal case assumptions are that data are collected in equal length intervals, and while comparing time series, the lengths are usually expected to be equal to each other. However, these assumptions are not valid for many real data sets, especially for the clinical trials data sets. An addition, the data sources are different from each other, the data are heterogeneous, and the sensitivity of the experiments varies by the source. Approaches for mining time series data need to be revisited, keeping the wide range of requirements in mind. In this paper, we propose a novel approach for information mining that involves two major steps: applying a data mining algorithm over homogeneous subsets of data, and identifying common or distinct patterns over the information gathered in the first step. Our approach is implemented specifically for heterogeneous and high dimensional time series clinical trials data. Using this framework, we propose a new way of utilizing frequent itemset mining, as well as clustering and declustering techniques with novel distance metrics for measuring similarity between time series data. By clustering the data, we find groups of analytes (substances in blood) that are most strongly correlated. Most of these relationships already known are verified by the clinical panels, and, in addition, we identify novel groups that need further biomedical analysis. A slight modification to our algorithm results an effective declustering of high dimensional time series data, which is then used for "feature selection." Using industry-sponsored clinical trials data sets, we are able to identify a small set of analytes that effectively models the state of normal health.
ClueNet: Clustering a temporal network based on topological similarity rather than denseness.
Crawford, Joseph; Milenković, Tijana
2018-01-01
Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data-their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.
InCHlib - interactive cluster heatmap for web applications.
Skuta, Ctibor; Bartůněk, Petr; Svozil, Daniel
2014-12-01
Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
NASA Astrophysics Data System (ADS)
Fang, Z.; Ward, A. L.; Fang, Y.; Yabusaki, S.
2011-12-01
High-resolution geologic models have proven effective in improving the accuracy of subsurface flow and transport predictions. However, many of the parameters in subsurface flow and transport models cannot be determined directly at the scale of interest and must be estimated through inverse modeling. A major challenge, particularly in vadose zone flow and transport, is the inversion of the highly-nonlinear, high-dimensional problem as current methods are not readily scalable for large-scale, multi-process models. In this paper we describe the implementation of a fully automated approach for addressing complex parameter optimization and sensitivity issues on massively parallel multi- and many-core systems. The approach is based on the integration of PNNL's extreme scale Subsurface Transport Over Multiple Phases (eSTOMP) simulator, which uses the Global Array toolkit, with the Beowulf-Cluster inspired parallel nonlinear parameter estimation software, BeoPEST in the MPI mode. In the eSTOMP/BeoPEST implementation, a pre-processor generates all of the PEST input files based on the eSTOMP input file. Simulation results for comparison with observations are extracted automatically at each time step eliminating the need for post-process data extractions. The inversion framework was tested with three different experimental data sets: one-dimensional water flow at Hanford Grass Site; irrigation and infiltration experiment at the Andelfingen Site; and a three-dimensional injection experiment at Hanford's Sisson and Lu Site. Good agreements are achieved in all three applications between observations and simulations in both parameter estimates and water dynamics reproduction. Results show that eSTOMP/BeoPEST approach is highly scalable and can be run efficiently with hundreds or thousands of processors. BeoPEST is fault tolerant and new nodes can be dynamically added and removed. A major advantage of this approach is the ability to use high-resolution geologic models to preserve the spatial structure in the inverse model, which leads to better parameter estimates and improved predictions when using the inverse-conditioned realizations of parameter fields.
Wilson, Anna C.; Lengua, Liliana J.; Tininenko, Jennifer; Taylor, Adam; Trancik, Anika
2009-01-01
This longitudinal study utilized a community sample of children (N=91, 45% female, 8–11 years at time 1) to investigate physiological responses (heart rate reactivity [HRR] and electrodermal responding [EDR]) during delay of gratification in relation to emotionality, self-regulation, and adjustment problems. Cluster analyses identified three profiles among children who successfully delayed: children who waited easily with low EDR and moderate HRR, children who had difficulty waiting with high EDR and moderate HRR, and children who had difficulty waiting with low EDR and low HRR. The 3 clusters and children who did not wait were compared. Children with low EDR-low HRR had the lowest self-regulation, and like the no-wait group, demonstrated the greatest baseline adjustment problems. The high EDR-moderate HRR group demonstrated highest self-regulation and increases in depression across one year. Distinct profiles among children in delay contexts point to children who are over- and under-regulated with implications for adjustment problems. PMID:20046898
Stable dissipative optical vortex clusters by inhomogeneous effective diffusion.
Li, Huishan; Lai, Shiquan; Qui, Yunli; Zhu, Xing; Xie, Jianing; Mihalache, Dumitru; He, Yingji
2017-10-30
We numerically show the generation of robust vortex clusters embedded in a two-dimensional beam propagating in a dissipative medium described by the generic cubic-quintic complex Ginzburg-Landau equation with an inhomogeneous effective diffusion term, which is asymmetrical in the two transverse directions and periodically modulated in the longitudinal direction. We show the generation of stable optical vortex clusters for different values of the winding number (topological charge) of the input optical beam. We have found that the number of individual vortex solitons that form the robust vortex cluster is equal to the winding number of the input beam. We have obtained the relationships between the amplitudes and oscillation periods of the inhomogeneous effective diffusion and the cubic gain and diffusion (viscosity) parameters, which depict the regions of existence and stability of vortex clusters. The obtained results offer a method to form robust vortex clusters embedded in two-dimensional optical beams, and we envisage potential applications in the area of structured light.
NASA Astrophysics Data System (ADS)
Albirri, E. R.; Sugeng, K. A.; Aldila, D.
2018-04-01
Nowadays, in the modern world, since technology and human civilization start to progress, all city in the world is almost connected. The various places in this world are easier to visit. It is an impact of transportation technology and highway construction. The cities which have been connected can be represented by graph. Graph clustering is one of ways which is used to answer some problems represented by graph. There are some methods in graph clustering to solve the problem spesifically. One of them is Highly Connected Subgraphs (HCS) method. HCS is used to identify cluster based on the graph connectivity k for graph G. The connectivity in graph G is denoted by k(G)> \\frac{n}{2} that n is the total of vertices in G, then it is called as HCS or the cluster. This research used literature review and completed with simulation of program in a software. We modified HCS algorithm by using weighted graph. The modification is located in the Process Phase. Process Phase is used to cut the connected graph G into two subgraphs H and \\bar{H}. We also made a program by using software Octave-401. Then we applied the data of Flight Routes Mapping of One of Airlines in Indonesia to our program.
NASA Astrophysics Data System (ADS)
Khuwaileh, Bassam
High fidelity simulation of nuclear reactors entails large scale applications characterized with high dimensionality and tremendous complexity where various physics models are integrated in the form of coupled models (e.g. neutronic with thermal-hydraulic feedback). Each of the coupled modules represents a high fidelity formulation of the first principles governing the physics of interest. Therefore, new developments in high fidelity multi-physics simulation and the corresponding sensitivity/uncertainty quantification analysis are paramount to the development and competitiveness of reactors achieved through enhanced understanding of the design and safety margins. Accordingly, this dissertation introduces efficient and scalable algorithms for performing efficient Uncertainty Quantification (UQ), Data Assimilation (DA) and Target Accuracy Assessment (TAA) for large scale, multi-physics reactor design and safety problems. This dissertation builds upon previous efforts for adaptive core simulation and reduced order modeling algorithms and extends these efforts towards coupled multi-physics models with feedback. The core idea is to recast the reactor physics analysis in terms of reduced order models. This can be achieved via identifying the important/influential degrees of freedom (DoF) via the subspace analysis, such that the required analysis can be recast by considering the important DoF only. In this dissertation, efficient algorithms for lower dimensional subspace construction have been developed for single physics and multi-physics applications with feedback. Then the reduced subspace is used to solve realistic, large scale forward (UQ) and inverse problems (DA and TAA). Once the elite set of DoF is determined, the uncertainty/sensitivity/target accuracy assessment and data assimilation analysis can be performed accurately and efficiently for large scale, high dimensional multi-physics nuclear engineering applications. Hence, in this work a Karhunen-Loeve (KL) based algorithm previously developed to quantify the uncertainty for single physics models is extended for large scale multi-physics coupled problems with feedback effect. Moreover, a non-linear surrogate based UQ approach is developed, used and compared to performance of the KL approach and brute force Monte Carlo (MC) approach. On the other hand, an efficient Data Assimilation (DA) algorithm is developed to assess information about model's parameters: nuclear data cross-sections and thermal-hydraulics parameters. Two improvements are introduced in order to perform DA on the high dimensional problems. First, a goal-oriented surrogate model can be used to replace the original models in the depletion sequence (MPACT -- COBRA-TF - ORIGEN). Second, approximating the complex and high dimensional solution space with a lower dimensional subspace makes the sampling process necessary for DA possible for high dimensional problems. Moreover, safety analysis and design optimization depend on the accurate prediction of various reactor attributes. Predictions can be enhanced by reducing the uncertainty associated with the attributes of interest. Accordingly, an inverse problem can be defined and solved to assess the contributions from sources of uncertainty; and experimental effort can be subsequently directed to further improve the uncertainty associated with these sources. In this dissertation a subspace-based gradient-free and nonlinear algorithm for inverse uncertainty quantification namely the Target Accuracy Assessment (TAA) has been developed and tested. The ideas proposed in this dissertation were first validated using lattice physics applications simulated using SCALE6.1 package (Pressurized Water Reactor (PWR) and Boiling Water Reactor (BWR) lattice models). Ultimately, the algorithms proposed her were applied to perform UQ and DA for assembly level (CASL progression problem number 6) and core wide problems representing Watts Bar Nuclear 1 (WBN1) for cycle 1 of depletion (CASL Progression Problem Number 9) modeled via simulated using VERA-CS which consists of several multi-physics coupled models. The analysis and algorithms developed in this dissertation were encoded and implemented in a newly developed tool kit algorithms for Reduced Order Modeling based Uncertainty/Sensitivity Estimator (ROMUSE).
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems
NASA Astrophysics Data System (ADS)
Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen
Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
Visual analytics of large multidimensional data using variable binned scatter plots
NASA Astrophysics Data System (ADS)
Hao, Ming C.; Dayal, Umeshwar; Sharma, Ratnesh K.; Keim, Daniel A.; Janetzko, Halldór
2010-01-01
The scatter plot is a well-known method of visualizing pairs of two-dimensional continuous variables. Multidimensional data can be depicted in a scatter plot matrix. They are intuitive and easy-to-use, but often have a high degree of overlap which may occlude a significant portion of data. In this paper, we propose variable binned scatter plots to allow the visualization of large amounts of data without overlapping. The basic idea is to use a non-uniform (variable) binning of the x and y dimensions and plots all the data points that fall within each bin into corresponding squares. Further, we map a third attribute to color for visualizing clusters. Analysts are able to interact with individual data points for record level information. We have applied these techniques to solve real-world problems on credit card fraud and data center energy consumption to visualize their data distribution and cause-effect among multiple attributes. A comparison of our methods with two recent well-known variants of scatter plots is included.
Pattern of clustering of menopausal problems: A study with a Bengali Hindu ethnic group.
Dasgupta, Doyel; Pal, Baidyanath; Ray, Subha
2016-01-01
We attempted to find out how menopausal problems cluster with each other. The study was conducted among a group of women belonging to a Bengali-speaking Hindu ethnic group of West Bengal, a state located in Eastern India. We recruited 1,400 participants for the study. Information on sociodemographic aspects and menopausal problems were collected from these participants with the help of a pretested questionnaire. Results of cluster analysis showed that vasomotor, vaginal, and urinary problems cluster together, separately from physical and psychosomatic problems.
Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bogen, Paul Logasa; Symons, Christopher T; McKenzie, Amber T
2013-01-01
In a world where large-scale text collections are not only becoming ubiquitous but also are growing at increasing rates, near duplicate documents are becoming a growing concern that has the potential to hinder many different information filtering tasks. While others have tried to address this problem, prior techniques have only been used on limited collection sizes and static cases. We will briefly describe the problem in the context of Open Source Intelligence (OSINT) along with our additional constraints for performance. In this work we propose two variations on Multi-dimensional Spectral Hash (MDSH) tailored for working on extremely large, growing setsmore » of text documents. We analyze the memory and runtime characteristics of our techniques and provide an informal analysis of the quality of the near-duplicate clusters produced by our techniques.« less
Silicon decorated cone shaped carbon nanotube clusters for lithium ion battery anodes.
Wang, Wei; Ruiz, Isaac; Ahmed, Kazi; Bay, Hamed Hosseini; George, Aaron S; Wang, Johnny; Butler, John; Ozkan, Mihrimah; Ozkan, Cengiz S
2014-08-27
In this work, we report the synthesis of an three-dimensional (3D) cone-shape CNT clusters (CCC) via chemical vapor deposition (CVD) with subsequent inductively coupled plasma (ICP) treatment. An innovative silicon decorated cone-shape CNT clusters (SCCC) is prepared by simply depositing amorphous silicon onto CCC via magnetron sputtering. The seamless connection between silicon decorated CNT cones and graphene facilitates the charge transfer in the system and suggests a binder-free technique of preparing lithium ion battery (LIB) anodes. Lithium ion batteries based on this novel 3D SCCC architecture demonstrates high reversible capacity of 1954 mAh g(-1) and excellent cycling stability (>1200 mAh g(-1) capacity with ≈ 100% coulombic efficiency after 230 cycles). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Technical Reports Server (NTRS)
Li, Z. K.
1985-01-01
A specialized program was developed for flow cytometric list-mode data using an heirarchical tree method for identifying and enumerating individual subpopulations, the method of principal components for a two-dimensional display of 6-parameter data array, and a standard sorting algorithm for characterizing subpopulations. The program was tested against a published data set subjected to cluster analysis and experimental data sets from controlled flow cytometry experiments using a Coulter Electronics EPICS V Cell Sorter. A version of the program in compiled BASIC is usable on a 16-bit microcomputer with the MS-DOS operating system. It is specialized for 6 parameters and up to 20,000 cells. Its two-dimensional display of Euclidean distances reveals clusters clearly, as does its 1-dimensional display. The identified subpopulations can, in suitable experiments, be related to functional subpopulations of cells.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Al Qahtani, Hassan S.; Andersson, Gunther G., E-mail: gunther.andersson@flinders.edu.au, E-mail: nakayama.tomonobu@nims.go.jp; Kimoto, Koji
2016-03-21
Triphenylphosphine ligand-protected Au{sub 9} clusters deposited onto titania nanosheets show three different atomic configurations as observed by scanning transmission electron microscopy. The configurations observed are a 3-dimensional structure, corresponding to the previously proposed Au{sub 9} core of the clusters, and two pseudo-2-dimensional (pseudo-2D) structures, newly found by this work. With the help of density functional theory (DFT) calculations, the observed pseudo-2D structures are attributed to the low energy, de-ligated structures formed through interaction with the substrate. The combination of scanning transmission electron microscopy with DFT calculations thus allows identifying whether or not the deposited Au{sub 9} clusters have been de-ligatedmore » in the deposition process.« less
FAST TRACK COMMUNICATION Critical exponents of domain walls in the two-dimensional Potts model
NASA Astrophysics Data System (ADS)
Dubail, Jérôme; Lykke Jacobsen, Jesper; Saleur, Hubert
2010-12-01
We address the geometrical critical behavior of the two-dimensional Q-state Potts model in terms of the spin clusters (i.e. connected domains where the spin takes a constant value). These clusters are different from the usual Fortuin-Kasteleyn clusters, and are separated by domain walls that can cross and branch. We develop a transfer matrix technique enabling the formulation and numerical study of spin clusters even when Q is not an integer. We further identify geometrically the crossing events which give rise to conformal correlation functions. This leads to an infinite series of fundamental critical exponents h_{\\ell _1-\\ell _2,2\\ell _1}, valid for 0 <= Q <= 4, that describe the insertion of ell1 thin and ell2 thick domain walls.
Simulation and Analysis of Converging Shock Wave Test Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramsey, Scott D.; Shashkov, Mikhail J.
2012-06-21
Results and analysis pertaining to the simulation of the Guderley converging shock wave test problem (and associated code verification hydrodynamics test problems involving converging shock waves) in the LANL ASC radiation-hydrodynamics code xRAGE are presented. One-dimensional (1D) spherical and two-dimensional (2D) axi-symmetric geometric setups are utilized and evaluated in this study, as is an instantiation of the xRAGE adaptive mesh refinement capability. For the 2D simulations, a 'Surrogate Guderley' test problem is developed and used to obviate subtleties inherent to the true Guderley solution's initialization on a square grid, while still maintaining a high degree of fidelity to the originalmore » problem, and minimally straining the general credibility of associated analysis and conclusions.« less
Segmentation of High Angular Resolution Diffusion MRI using Sparse Riemannian Manifold Clustering
Wright, Margaret J.; Thompson, Paul M.; Vidal, René
2015-01-01
We address the problem of segmenting high angular resolution diffusion imaging (HARDI) data into multiple regions (or fiber tracts) with distinct diffusion properties. We use the orientation distribution function (ODF) to represent HARDI data and cast the problem as a clustering problem in the space of ODFs. Our approach integrates tools from sparse representation theory and Riemannian geometry into a graph theoretic segmentation framework. By exploiting the Riemannian properties of the space of ODFs, we learn a sparse representation for each ODF and infer the segmentation by applying spectral clustering to a similarity matrix built from these representations. In cases where regions with similar (resp. distinct) diffusion properties belong to different (resp. same) fiber tracts, we obtain the segmentation by incorporating spatial and user-specified pairwise relationships into the formulation. Experiments on synthetic data evaluate the sensitivity of our method to image noise and the presence of complex fiber configurations, and show its superior performance compared to alternative segmentation methods. Experiments on phantom and real data demonstrate the accuracy of the proposed method in segmenting simulated fibers, as well as white matter fiber tracts of clinical importance in the human brain. PMID:24108748
Autonomic Cluster Management System (ACMS): A Demonstration of Autonomic Principles at Work
NASA Technical Reports Server (NTRS)
Baldassari, James D.; Kopec, Christopher L.; Leshay, Eric S.; Truszkowski, Walt; Finkel, David
2005-01-01
Cluster computing, whereby a large number of simple processors or nodes are combined together to apparently function as a single powerful computer, has emerged as a research area in its own right. The approach offers a relatively inexpensive means of achieving significant computational capabilities for high-performance computing applications, while simultaneously affording the ability to. increase that capability simply by adding more (inexpensive) processors. However, the task of manually managing and con.guring a cluster quickly becomes impossible as the cluster grows in size. Autonomic computing is a relatively new approach to managing complex systems that can potentially solve many of the problems inherent in cluster management. We describe the development of a prototype Automatic Cluster Management System (ACMS) that exploits autonomic properties in automating cluster management.
Music Taste Groups and Problem Behavior
ERIC Educational Resources Information Center
Mulder, Juul; ter Bogt, Tom; Raaijmakers, Quinten; Vollebergh, Wilma
2007-01-01
Internalizing and externalizing problems differ by musical tastes. A high school-based sample of 4159 adolescents, representative of Dutch youth aged 12 to 16, reported on their personal and social characteristics, music preferences and social-psychological functioning, measured with the Youth Self-Report (YSR). Cluster analysis on their music…
The quest for inorganic fullerenes
NASA Astrophysics Data System (ADS)
Pietsch, Susanne; Dollinger, Andreas; Strobel, Christoph H.; Park, Eun Ji; Ganteför, Gerd; Seo, Hyun Ook; Kim, Young Dok; Idrobo, Juan-Carlos; Pennycook, Stephen J.
2015-10-01
Experimental results of the search for inorganic fullerenes are presented. MonSm- and WnSm- clusters are generated with a pulsed arc cluster ion source equipped with an annealing stage. This is known to enhance fullerene formation in the case of carbon. Analogous to carbon, the mass spectra of the metal chalcogenide clusters produced in this way exhibit a bimodal structure. The species in the first maximum at low mass are known to be platelets. Here, the structure of the species in the second maximum is studied by anion photoelectron spectroscopy, scanning transmission electron microscopy, and scanning tunneling microcopy. All experimental results indicate a two-dimensional structure of these species and disagree with a three-dimensional fullerene-like geometry. A possible explanation for this preference of two-dimensional structures is the ability of a two-element material to saturate the dangling bonds at the edges of a platelet by excess atoms of one element. A platelet consisting of a single element only cannot do this. Accordingly, graphite and boron might be the only materials forming nano-spheres because they are the only single element materials assuming two-dimensional structures.
Lee, Seungyeoun; Kim, Yongkang; Kwon, Min-Seok; Park, Taesung
2015-01-01
Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies. PMID:26339630
Super resolution reconstruction of infrared images based on classified dictionary learning
NASA Astrophysics Data System (ADS)
Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng
2018-05-01
Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.
Choi, Byoung-Ju; Lee, Jung A; Choi, Jae-Sung; Park, Jong-Gyu; Lee, Sang-Ho; Yih, Wonho
2017-04-01
Hydrographic observation and biological samplings were conducted to assess the distribution of phytoplankton community over the sloping shelf of the eastern Yellow Sea in May 2012. The concentration of chlorophyll a was determined and phytoplankton was microscopically examined to conduct quantitative and cluster analyses. A cluster analysis of the phytoplankton species and abundance along four observation lines revealed the three-dimensional structure of the phytoplankton community distribution: the coastal group in the mixed region, the offshore upper layer group preferring stable water column, and the offshore lower layer group. The subsurface maximum of phytoplankton abundance and chlorophyll a concentration appeared as far as 64 km away from the tidal front through the middle layer intrusion. The phytoplankton abundance was high in the shore side of tidal front during the spring tide. The phytoplankton abundance was relatively high at 10-m depth in the mixed region while the concentration of chlorophyll a was high below the depth. The disparity between the profiles of the phytoplankton abundance and the chlorophyll a concentration in the mixed region was related to the depth-dependent species change accompanied by size-fraction of the phytoplankton community. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gastegger, Michael; Kauffmann, Clemens; Marquetand, Philipp, E-mail: philipp.marquetand@univie.ac.at
Many approaches, which have been developed to express the potential energy of large systems, exploit the locality of the atomic interactions. A prominent example is the fragmentation methods in which the quantum chemical calculations are carried out for overlapping small fragments of a given molecule that are then combined in a second step to yield the system’s total energy. Here we compare the accuracy of the systematic molecular fragmentation approach with the performance of high-dimensional neural network (HDNN) potentials introduced by Behler and Parrinello. HDNN potentials are similar in spirit to the fragmentation approach in that the total energy ismore » constructed as a sum of environment-dependent atomic energies, which are derived indirectly from electronic structure calculations. As a benchmark set, we use all-trans alkanes containing up to eleven carbon atoms at the coupled cluster level of theory. These molecules have been chosen because they allow to extrapolate reliable reference energies for very long chains, enabling an assessment of the energies obtained by both methods for alkanes including up to 10 000 carbon atoms. We find that both methods predict high-quality energies with the HDNN potentials yielding smaller errors with respect to the coupled cluster reference.« less
Categorical clustering of the neural representation of color.
Brouwer, Gijs Joost; Heeger, David J
2013-09-25
Cortical activity was measured with functional magnetic resonance imaging (fMRI) while human subjects viewed 12 stimulus colors and performed either a color-naming or diverted attention task. A forward model was used to extract lower dimensional neural color spaces from the high-dimensional fMRI responses. The neural color spaces in two visual areas, human ventral V4 (V4v) and VO1, exhibited clustering (greater similarity between activity patterns evoked by stimulus colors within a perceptual category, compared to between-category colors) for the color-naming task, but not for the diverted attention task. Response amplitudes and signal-to-noise ratios were higher in most visual cortical areas for color naming compared to diverted attention. But only in V4v and VO1 did the cortical representation of color change to a categorical color space. A model is presented that induces such a categorical representation by changing the response gains of subpopulations of color-selective neurons.
Categorical Clustering of the Neural Representation of Color
Heeger, David J.
2013-01-01
Cortical activity was measured with functional magnetic resonance imaging (fMRI) while human subjects viewed 12 stimulus colors and performed either a color-naming or diverted attention task. A forward model was used to extract lower dimensional neural color spaces from the high-dimensional fMRI responses. The neural color spaces in two visual areas, human ventral V4 (V4v) and VO1, exhibited clustering (greater similarity between activity patterns evoked by stimulus colors within a perceptual category, compared to between-category colors) for the color-naming task, but not for the diverted attention task. Response amplitudes and signal-to-noise ratios were higher in most visual cortical areas for color naming compared to diverted attention. But only in V4v and VO1 did the cortical representation of color change to a categorical color space. A model is presented that induces such a categorical representation by changing the response gains of subpopulations of color-selective neurons. PMID:24068814
An Empirical Study of Personality Disorders Among Treatment-Seeking Problem Gamblers.
Brown, M; Oldenhof, E; Allen, J S; Dowling, N A
2016-12-01
The primary aims of this study were to examine the prevalence of personality disorders in problem gamblers, to explore the relationship between personality disorders and problem gambling severity, and to explore the degree to which the psychological symptoms highlighted in the biosocial developmental model of borderline personality disorder (impulsivity, distress tolerance, substance use, PTSD symptoms, psychological distress and work/social adjustment) are associated with problem gambling. A secondary aim was to explore the strength of the relationships between these symptoms and problem gambling severity in problem gamblers with and without personality disorder pathology. Participants were 168 consecutively admitted problem gamblers seeking treatment from a specialist outpatient gambling service in Australia. The prevalence of personality disorders using the self-report version of the Iowa Personality Disorders Screen was 43.3 %. Cluster B personality disorders, but not Cluster A or C personality disorders, were associated with problem gambling severity. All psychological symptoms, except alcohol and drug use, were significantly higher among participants with personality disorder pathology compared to those without. Finally, psychological distress, and work and social adjustment were significantly associated with problem gambling severity for problem gamblers with personality disorder pathology, while impulsivity, psychological distress, and work and social adjustment were significantly associated with problem gambling severity for those without personality disorder pathology. High rates of comorbid personality disorders, particularly Cluster B disorders, necessitate routine screening in gambling treatment services. More complex psychological profiles may complicate treatment for problem gamblers with comorbid personality disorders. Future research should examine the applicability of the biosocial developmental model to problem gambling in community studies.
Three-dimensional discrete-time Lotka-Volterra models with an application to industrial clusters
NASA Astrophysics Data System (ADS)
Bischi, G. I.; Tramontana, F.
2010-10-01
We consider a three-dimensional discrete dynamical system that describes an application to economics of a generalization of the Lotka-Volterra prey-predator model. The dynamic model proposed is used to describe the interactions among industrial clusters (or districts), following a suggestion given by [23]. After studying some local and global properties and bifurcations in bidimensional Lotka-Volterra maps, by numerical explorations we show how some of them can be extended to their three-dimensional counterparts, even if their analytic and geometric characterization becomes much more difficult and challenging. We also show a global bifurcation of the three-dimensional system that has no two-dimensional analogue. Besides the particular economic application considered, the study of the discrete version of Lotka-Volterra dynamical systems turns out to be a quite rich and interesting topic by itself, i.e. from a purely mathematical point of view.
High-frequency modes in a two-dimensional rectangular room with windows
NASA Astrophysics Data System (ADS)
Shabalina, E. D.; Shirgina, N. V.; Shanin, A. V.
2010-07-01
We examine a two-dimensional model problem of architectural acoustics on sound propagation in a rectangular room with windows. It is supposed that the walls are ideally flat and hard; the windows absorb all energy that falls upon them. We search for the modes of such a room having minimal attenuation indices, which have the expressed structure of billiard trajectories. The main attenuation mechanism for such modes is diffraction at the edges of the windows. We construct estimates for the attenuation indices of the given modes based on the solution to the Weinstein problem. We formulate diffraction problems similar to the statement of the Weinstein problem that describe the attenuation of billiard modes in complex situations.
Applications of conformal field theory to problems in 2D percolation
NASA Astrophysics Data System (ADS)
Simmons, Jacob Joseph Harris
This thesis explores critical two-dimensional percolation in bounded regions in the continuum limit. The main method which we employ is conformal field theory (CFT). Our specific results follow from the null-vector structure of the c = 0 CFT that applies to critical two-dimensional percolation. We also make use of the duality symmetry obeyed at the percolation point, and the fact that percolation may be understood as the q-state Potts model in the limit q → 1. Our first results describe the correlations between points in the bulk and boundary intervals or points, i.e. the probability that the various points or intervals are in the same percolation cluster. These quantities correspond to order-parameter profiles under the given conditions, or cluster connection probabilities. We consider two specific cases: an anchoring interval, and two anchoring points. We derive results for these and related geometries using the CFT null-vectors for the corresponding boundary condition changing (bcc) operators. In addition, we exhibit several exact relationships between these probabilities. These relations between the various bulk-boundary connection probabilities involve parameters of the CFT called operator product expansion (OPE) coefficients. We then compute several of these OPE coefficients, including those arising in our new probability relations. Beginning with the familiar CFT operator φ1,2, which corresponds to a free-fixed spin boundary change in the q-state Potts model, we then develop physical interpretations of the bcc operators. We argue that, when properly normalized, higher-order bcc operators correspond to successive fusions of multiple φ1,2, operators. Finally, by identifying the derivative of φ1,2 with the operator φ1,4, we derive several new quantities called first crossing densities. These new results are then combined and integrated to obtain the three previously known crossing quantities in a rectangle: the probability of a horizontal crossing cluster, the probability of a cluster crossing both horizontally and vertically, and the expected number of horizontal crossing clusters. These three results were known to be solutions to a certain fifth-order differential equation, but until now no physically meaningful explanation had appeared. This differential equation arises naturally in our derivation.
Vickers, Douglas; Lee, Michael D; Dry, Matthew; Hughes, Peter
2003-10-01
The planar Euclidean version of the traveling salesperson problem requires finding the shortest tour through a two-dimensional array of points. MacGregor and Ormerod (1996) have suggested that people solve such problems by using a global-to-local perceptual organizing process based on the convex hull of the array. We review evidence for and against this idea, before considering an alternative, local-to-global perceptual process, based on the rapid automatic identification of nearest neighbors. We compare these approaches in an experiment in which the effects of number of convex hull points and number of potential intersections on solution performance are measured. Performance worsened with more points on the convex hull and with fewer potential intersections. A measure of response uncertainty was unaffected by the number of convex hull points but increased with fewer potential intersections. We discuss a possible interpretation of these results in terms of a hierarchical solution process based on linking nearest neighbor clusters.
Least-squares model-based halftoning
NASA Astrophysics Data System (ADS)
Pappas, Thrasyvoulos N.; Neuhoff, David L.
1992-08-01
A least-squares model-based approach to digital halftoning is proposed. It exploits both a printer model and a model for visual perception. It attempts to produce an 'optimal' halftoned reproduction, by minimizing the squared error between the response of the cascade of the printer and visual models to the binary image and the response of the visual model to the original gray-scale image. Conventional methods, such as clustered ordered dither, use the properties of the eye only implicitly, and resist printer distortions at the expense of spatial and gray-scale resolution. In previous work we showed that our printer model can be used to modify error diffusion to account for printer distortions. The modified error diffusion algorithm has better spatial and gray-scale resolution than conventional techniques, but produces some well known artifacts and asymmetries because it does not make use of an explicit eye model. Least-squares model-based halftoning uses explicit eye models and relies on printer models that predict distortions and exploit them to increase, rather than decrease, both spatial and gray-scale resolution. We have shown that the one-dimensional least-squares problem, in which each row or column of the image is halftoned independently, can be implemented with the Viterbi's algorithm. Unfortunately, no closed form solution can be found in two dimensions. The two-dimensional least squares solution is obtained by iterative techniques. Experiments show that least-squares model-based halftoning produces more gray levels and better spatial resolution than conventional techniques. We also show that the least- squares approach eliminates the problems associated with error diffusion. Model-based halftoning can be especially useful in transmission of high quality documents using high fidelity gray-scale image encoders. As we have shown, in such cases halftoning can be performed at the receiver, just before printing. Apart from coding efficiency, this approach permits the halftoner to be tuned to the individual printer, whose characteristics may vary considerably from those of other printers, for example, write-black vs. write-white laser printers.
Phase-space finite elements in a least-squares solution of the transport equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drumm, C.; Fan, W.; Pautz, S.
2013-07-01
The linear Boltzmann transport equation is solved using a least-squares finite element approximation in the space, angular and energy phase-space variables. The method is applied to both neutral particle transport and also to charged particle transport in the presence of an electric field, where the angular and energy derivative terms are handled with the energy/angular finite elements approximation, in a manner analogous to the way the spatial streaming term is handled. For multi-dimensional problems, a novel approach is used for the angular finite elements: mapping the surface of a unit sphere to a two-dimensional planar region and using a meshingmore » tool to generate a mesh. In this manner, much of the spatial finite-elements machinery can be easily adapted to handle the angular variable. The energy variable and the angular variable for one-dimensional problems make use of edge/beam elements, also building upon the spatial finite elements capabilities. The methods described here can make use of either continuous or discontinuous finite elements in space, angle and/or energy, with the use of continuous finite elements resulting in a smaller problem size and the use of discontinuous finite elements resulting in more accurate solutions for certain types of problems. The work described in this paper makes use of continuous finite elements, so that the resulting linear system is symmetric positive definite and can be solved with a highly efficient parallel preconditioned conjugate gradients algorithm. The phase-space finite elements capability has been built into the Sceptre code and applied to several test problems, including a simple one-dimensional problem with an analytic solution available, a two-dimensional problem with an isolated source term, showing how the method essentially eliminates ray effects encountered with discrete ordinates, and a simple one-dimensional charged-particle transport problem in the presence of an electric field. (authors)« less
The Goertler vortex instability mechanism in three-dimensional boundary layers
NASA Technical Reports Server (NTRS)
Hall, P.
1984-01-01
The two dimensional boundary layer on a concave wall is centrifugally unstable with respect to vortices aligned with the basic flow for sufficiently high values of the Goertler number. However, in most situations of practical interest the basic flow is three dimensional and previous theoretical investigations do not apply. The linear stability of the flow over an infinitely long swept wall of variable curvature is considered. If there is no pressure gradient in the boundary layer the instability problem can always be related to an equivalent two dimensional calculation. However, in general, this is not the case and even for small values of the crossflow velocity field dramatic differences between the two and three dimensional problems emerge. When the size of the crossflow is further increased, the vortices in the neutral location have their axes locally perpendicular to the vortex lines of the basic flow.
Liu, Guangfeng; Liu, Jie; Nie, Lina; Ban, Rui; Armatas, Gerasimos S; Tao, Xutang; Zhang, Qichun
2017-05-15
A zero-dimensional N,N'-dibutyl-4,4'-dipyridinium bromoplumbate, [BV] 6 [Pb 9 Br 30 ], with unusual discrete [Pb 9 Br 30 ] 12- anionic clusters was prepared via a facile surfactant-mediated solvothermal process. This bromoplumbate exhibits a narrower optical band gap relative to the congeneric one-dimensional viologen bromoplumbates.
Paterson, Gillian; Power, Kevin; Yellowlees, Alex; Park, Katy; Taylor, Louise
2007-01-01
Research examining cognitive and behavioural determinants of anorexia is currently lacking. This has implications for the success of treatment programmes for anorexics, particularly, given the high reported dropout rates. This study examines two-dimensional self-esteem (comprising of self-competence and self-liking) and social problem-solving in an anorexic population and predicts that self-esteem will mediate the relationship between problem-solving and eating pathology by facilitating/inhibiting use of faulty/effective strategies. Twenty-seven anorexic inpatients and 62 controls completed measures of social problem solving and two-dimensional self-esteem. Anorexics scored significantly higher than the non-clinical group on measures of eating pathology, negative problem orientation, impulsivity/carelessness and avoidance and significantly lower on positive problem orientation and both self-esteem components. In the clinical sample, disordered eating correlated significantly with self-competence, negative problem-orientation and avoidance. Associations between disordered eating and problem solving lost significance when self-esteem was controlled in the clinical group only. Self-competence was found to be the main predictor of eating pathology in the clinical sample while self-liking, impulsivity and negative and positive problem orientation were main predictors in the non-clinical sample. Findings support the two-dimensional self-esteem theory with self-competence only being relevant to the anorexic population and support the hypothesis that self-esteem mediates the relationship between disordered eating and problem solving ability in an anorexic sample. Treatment implications include support for programmes emphasising increasing self-appraisal and self-efficacy. 2006 John Wiley & Sons, Ltd and Eating Disorders Association
Berne, Rosalyn W; Raviv, Daniel
2004-04-01
This paper introduces the Eight Dimensional Methodology for Innovative Thinking (the Eight Dimensional Methodology), for innovative problem solving, as a unified approach to case analysis that builds on comprehensive problem solving knowledge from industry, business, marketing, math, science, engineering, technology, arts, and daily life. It is designed to stimulate innovation by quickly generating unique "out of the box" unexpected and high quality solutions. It gives new insights and thinking strategies to solve everyday problems faced in the workplace, by helping decision makers to see otherwise obscure alternatives and solutions. Daniel Raviv, the engineer who developed the Eight Dimensional Methodology, and paper co-author, technology ethicist Rosalyn Berne, suggest that this tool can be especially useful in identifying solutions and alternatives for particular problems of engineering, and for the ethical challenges which arise with them. First, the Eight Dimensional Methodology helps to elucidate how what may appear to be a basic engineering problem also has ethical dimensions. In addition, it offers to the engineer a methodology for penetrating and seeing new dimensions of those problems. To demonstrate the effectiveness of the Eight Dimensional Methodology as an analytical tool for thinking about ethical challenges to engineering, the paper presents the case of the construction of the Large Binocular Telescope (LBT) on Mount Graham in Arizona. Analysis of the case offers to decision makers the use of the Eight Dimensional Methodology in considering alternative solutions for how they can proceed in their goals of exploring space. It then follows that same process through the second stage of exploring the ethics of each of those different solutions. The LBT project pools resources from an international partnership of universities and research institutes for the construction and maintenance of a highly sophisticated, powerful new telescope. It will soon mark the erection of the world's largest and most powerful optical telescope, designed to see fine detail otherwise visible only from space. It also represents a controversial engineering project that is being undertaken on land considered to be sacred by the local, native Apache people. As presented, the case features the University of Virginia, and its challenges in consideration of whether and how to join the LBT project consortium.
Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L
2017-06-28
Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.
3D variational brain tumor segmentation on a clustered feature set
NASA Astrophysics Data System (ADS)
Popuri, Karteek; Cobzas, Dana; Jagersand, Martin; Shah, Sirish L.; Murtha, Albert
2009-02-01
Tumor segmentation from MRI data is a particularly challenging and time consuming task. Tumors have a large diversity in shape and appearance with intensities overlapping the normal brain tissues. In addition, an expanding tumor can also deflect and deform nearby tissue. Our work addresses these last two difficult problems. We use the available MRI modalities (T1, T1c, T2) and their texture characteristics to construct a multi-dimensional feature set. Further, we extract clusters which provide a compact representation of the essential information in these features. The main idea in this paper is to incorporate these clustered features into the 3D variational segmentation framework. In contrast to the previous variational approaches, we propose a segmentation method that evolves the contour in a supervised fashion. The segmentation boundary is driven by the learned inside and outside region voxel probabilities in the cluster space. We incorporate prior knowledge about the normal brain tissue appearance, during the estimation of these region statistics. In particular, we use a Dirichlet prior that discourages the clusters in the ventricles to be in the tumor and hence better disambiguate the tumor from brain tissue. We show the performance of our method on real MRI scans. The experimental dataset includes MRI scans, from patients with difficult instances, with tumors that are inhomogeneous in appearance, small in size and in proximity to the major structures in the brain. Our method shows good results on these test cases.
NASA Astrophysics Data System (ADS)
Sun, Y.; Luo, G.
2017-12-01
Seismicity in a region is usually characterized by earthquake clusters and earthquake migration along its major fault zones. However, we do not fully understand why and how earthquake clusters and spatio-temporal migration of earthquakes occur. The northeastern Tibetan Plateau is a good example for us to investigate these problems. In this study, we construct and use a three-dimensional viscoelastoplastic finite-element model to simulate earthquake cycles and spatio-temporal migration of earthquakes along major fault zones in northeastern Tibetan Plateau. We calculate stress evolution and fault interactions, and explore effects of topographic loading and viscosity of middle-lower crust and upper mantle on model results. Model results show that earthquakes and fault interactions increase Coulomb stress on the neighboring faults or segments, accelerating the future earthquakes in this region. Thus, earthquakes occur sequentially in a short time, leading to regional earthquake clusters. Through long-term evolution, stresses on some seismogenic faults, which are far apart, may almost simultaneously reach the critical state of fault failure, probably also leading to regional earthquake clusters and earthquake migration. Based on our model synthetic seismic catalog and paleoseismic data, we analyze probability of earthquake migration between major faults in northeastern Tibetan Plateau. We find that following the 1920 M 8.5 Haiyuan earthquake and the 1927 M 8.0 Gulang earthquake, the next big event (M≥7) in northeastern Tibetan Plateau would be most likely to occur on the Haiyuan fault.
Data-Driven Packet Loss Estimation for Node Healthy Sensing in Decentralized Cluster
Fan, Hangyu; Wang, Huandong; Li, Yong
2018-01-01
Decentralized clustering of modern information technology is widely adopted in various fields these years. One of the main reason is the features of high availability and the failure-tolerance which can prevent the entire system form broking down by a failure of a single point. Recently, toolkits such as Akka are used by the public commonly to easily build such kind of cluster. However, clusters of such kind that use Gossip as their membership managing protocol and use link failure detecting mechanism to detect link failures cannot deal with the scenario that a node stochastically drops packets and corrupts the member status of the cluster. In this paper, we formulate the problem to be evaluating the link quality and finding a max clique (NP-Complete) in the connectivity graph. We then proposed an algorithm that consists of two models driven by data from application layer to respectively solving these two problems. Through simulations with statistical data and a real-world product, we demonstrate that our algorithm has a good performance. PMID:29360792
Electrodynamic tailoring of self-assembled three-dimensional electrospun constructs
NASA Astrophysics Data System (ADS)
Reis, Tiago C.; Correia, Ilídio J.; Aguiar-Ricardo, Ana
2013-07-01
The rational design of three-dimensional electrospun constructs (3DECs) can lead to striking topographies and tailored shapes of electrospun materials. This new generation of materials is suppressing some of the current limitations of the usual 2D non-woven electrospun fiber mats, such as small pore sizes or only flat shaped constructs. Herein, we pursued an explanation for the self-assembly of 3DECs based on electrodynamic simulations and experimental validation. We concluded that the self-assembly process is driven by the establishment of attractive electrostatic forces between the positively charged aerial fibers and the already collected ones, which tend to acquire a negatively charged network oriented towards the nozzle. The in situ polarization degree is strengthened by higher amounts of clustered fibers, and therefore the initial high density fibrous regions are the preliminary motifs for the self-assembly mechanism. As such regions increase their in situ polarization electrostatic repulsive forces will appear, favoring a competitive growth of these self-assembled fibrous clusters. Highly polarized regions will evidence higher distances between consecutive micro-assembled fibers (MAFs). Different processing parameters - deposition time, electric field intensity, concentration of polymer solution, environmental temperature and relative humidity - were evaluated in an attempt to control material's design.The rational design of three-dimensional electrospun constructs (3DECs) can lead to striking topographies and tailored shapes of electrospun materials. This new generation of materials is suppressing some of the current limitations of the usual 2D non-woven electrospun fiber mats, such as small pore sizes or only flat shaped constructs. Herein, we pursued an explanation for the self-assembly of 3DECs based on electrodynamic simulations and experimental validation. We concluded that the self-assembly process is driven by the establishment of attractive electrostatic forces between the positively charged aerial fibers and the already collected ones, which tend to acquire a negatively charged network oriented towards the nozzle. The in situ polarization degree is strengthened by higher amounts of clustered fibers, and therefore the initial high density fibrous regions are the preliminary motifs for the self-assembly mechanism. As such regions increase their in situ polarization electrostatic repulsive forces will appear, favoring a competitive growth of these self-assembled fibrous clusters. Highly polarized regions will evidence higher distances between consecutive micro-assembled fibers (MAFs). Different processing parameters - deposition time, electric field intensity, concentration of polymer solution, environmental temperature and relative humidity - were evaluated in an attempt to control material's design. Electronic supplementary information (ESI) available. See DOI: 10.1039/c3nr01668d
Pairing phase diagram of three holes in the generalized Hubbard model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Navarro, O.; Espinosa, J.E.
Investigations of high-{Tc} superconductors suggest that the electronic correlation may play a significant role in the formation of pairs. Although the main interest is on the physic of two-dimensional highly correlated electron systems, the one-dimensional models related to high temperature superconductivity are very popular due to the conjecture that properties of the 1D and 2D variants of certain models have common aspects. Within the models for correlated electron systems, that attempt to capture the essential physics of high-temperature superconductors and parent compounds, the Hubbard model is one of the simplest. Here, the pairing problem of a three electrons system hasmore » been studied by using a real-space method and the generalized Hubbard Hamiltonian. This method includes the correlated hopping interactions as an extension of the previously proposed mapping method, and is based on mapping the correlated many body problem onto an equivalent site- and bond-impurity tight-binding one in a higher dimensional space, where the problem was solved in a non-perturbative way. In a linear chain, the authors analyzed the pairing phase diagram of three correlated holes for different values of the Hamiltonian parameters. For some value of the hopping parameters they obtain an analytical solution for all kind of interactions.« less
Using Grey Wolf Algorithm to Solve the Capacitated Vehicle Routing Problem
NASA Astrophysics Data System (ADS)
Korayem, L.; Khorsid, M.; Kassem, S. S.
2015-05-01
The capacitated vehicle routing problem (CVRP) is a class of the vehicle routing problems (VRPs). In CVRP a set of identical vehicles having fixed capacities are required to fulfill customers' demands for a single commodity. The main objective is to minimize the total cost or distance traveled by the vehicles while satisfying a number of constraints, such as: the capacity constraint of each vehicle, logical flow constraints, etc. One of the methods employed in solving the CVRP is the cluster-first route-second method. It is a technique based on grouping of customers into a number of clusters, where each cluster is served by one vehicle. Once clusters are formed, a route determining the best sequence to visit customers is established within each cluster. The recently bio-inspired grey wolf optimizer (GWO), introduced in 2014, has proven to be efficient in solving unconstrained, as well as, constrained optimization problems. In the current research, our main contributions are: combining GWO with the traditional K-means clustering algorithm to generate the ‘K-GWO’ algorithm, deriving a capacitated version of the K-GWO algorithm by incorporating a capacity constraint into the aforementioned algorithm, and finally, developing 2 new clustering heuristics. The resulting algorithm is used in the clustering phase of the cluster-first route-second method to solve the CVR problem. The algorithm is tested on a number of benchmark problems with encouraging results.
Kimura, Shuhei; Sato, Masanao; Okada-Hatakeyama, Mariko
2013-01-01
The inference of a genetic network is a problem in which mutual interactions among genes are inferred from time-series of gene expression levels. While a number of models have been proposed to describe genetic networks, this study focuses on a mathematical model proposed by Vohradský. Because of its advantageous features, several researchers have proposed the inference methods based on Vohradský's model. When trying to analyze large-scale networks consisting of dozens of genes, however, these methods must solve high-dimensional non-linear function optimization problems. In order to resolve the difficulty of estimating the parameters of the Vohradský's model, this study proposes a new method that defines the problem as several two-dimensional function optimization problems. Through numerical experiments on artificial genetic network inference problems, we showed that, although the computation time of the proposed method is not the shortest, the method has the ability to estimate parameters of Vohradský's models more effectively with sufficiently short computation times. This study then applied the proposed method to an actual inference problem of the bacterial SOS DNA repair system, and succeeded in finding several reasonable regulations. PMID:24386175