Constructing Dense Graphs with Unique Hamiltonian Cycles
ERIC Educational Resources Information Center
Lynch, Mark A. M.
2012-01-01
It is not difficult to construct dense graphs containing Hamiltonian cycles, but it is difficult to generate dense graphs that are guaranteed to contain a unique Hamiltonian cycle. This article presents an algorithm for generating arbitrarily large simple graphs containing "unique" Hamiltonian cycles. These graphs can be turned into dense graphs…
Mining connected global and local dense subgraphs for bigdata
NASA Astrophysics Data System (ADS)
Wu, Bo; Shen, Haiying
2016-01-01
The problem of discovering connected dense subgraphs of natural graphs is important in data analysis. Discovering dense subgraphs that do not contain denser subgraphs or are not contained in denser subgraphs (called significant dense subgraphs) is also critical for wide-ranging applications. In spite of many works on discovering dense subgraphs, there are no algorithms that can guarantee the connectivity of the returned subgraphs or discover significant dense subgraphs. Hence, in this paper, we define two subgraph discovery problems to discover connected and significant dense subgraphs, propose polynomial-time algorithms and theoretically prove their validity. We also propose an algorithm to further improve the time and space efficiency of our basic algorithm for discovering significant dense subgraphs in big data by taking advantage of the unique features of large natural graphs. In the experiments, we use massive natural graphs to evaluate our algorithms in comparison with previous algorithms. The experimental results show the effectiveness of our algorithms for the two problems and their efficiency. This work is also the first that reveals the physical significance of significant dense subgraphs in natural graphs from different domains.
Real-Time Large-Scale Dense Mapping with Surfels
Fu, Xingyin; Zhu, Feng; Wu, Qingxiao; Sun, Yunlei; Lu, Rongrong; Yang, Ruigang
2018-01-01
Real-time dense mapping systems have been developed since the birth of consumer RGB-D cameras. Currently, there are two commonly used models in dense mapping systems: truncated signed distance function (TSDF) and surfel. The state-of-the-art dense mapping systems usually work fine with small-sized regions. The generated dense surface may be unsatisfactory around the loop closures when the system tracking drift grows large. In addition, the efficiency of the system with surfel model slows down when the number of the model points in the map becomes large. In this paper, we propose to use two maps in the dense mapping system. The RGB-D images are integrated into a local surfel map. The old surfels that reconstructed in former times and far away from the camera frustum are moved from the local map to the global map. The updated surfels in the local map when every frame arrives are kept bounded. Therefore, in our system, the scene that can be reconstructed is very large, and the frame rate of our system remains high. We detect loop closures and optimize the pose graph to distribute system tracking drift. The positions and normals of the surfels in the map are also corrected using an embedded deformation graph so that they are consistent with the updated poses. In order to deal with large surface deformations, we propose a new method for constructing constraints with system trajectories and loop closure keyframes. The proposed new method stabilizes large-scale surface deformation. Experimental results show that our novel system behaves better than the prior state-of-the-art dense mapping systems. PMID:29747450
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks
Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Finding Hierarchical and Overlapping Dense Subgraphs using Nucleus Decompositions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seshadhri, Comandur; Pinar, Ali; Sariyuce, Ahmet Erdem
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to nd the \\true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine a hierarchical structure among them. Current dense subgraph nding algorithms usually optimize some objective, and only nd a few such subgraphs without providing any hierarchy. It is also not clear how to account formore » overlaps in dense substructures. We de ne the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which allows for discovery of overlapping dense subgraphs. With the right parameters, the nuclear decomposition generalizes the classic notions of k-cores and k-trusses. We give provable e cient algorithms for nuclear decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-theart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour.« less
Overlapping communities from dense disjoint and high total degree clusters
NASA Astrophysics Data System (ADS)
Zhang, Hongli; Gao, Yang; Zhang, Yue
2018-04-01
Community plays an important role in the field of sociology, biology and especially in domains of computer science, where systems are often represented as networks. And community detection is of great importance in the domains. A community is a dense subgraph of the whole graph with more links between its members than between its members to the outside nodes, and nodes in the same community probably share common properties or play similar roles in the graph. Communities overlap when nodes in a graph belong to multiple communities. A vast variety of overlapping community detection methods have been proposed in the literature, and the local expansion method is one of the most successful techniques dealing with large networks. The paper presents a density-based seeding method, in which dense disjoint local clusters are searched and selected as seeds. The proposed method selects a seed by the total degree and density of local clusters utilizing merely local structures of the network. Furthermore, this paper proposes a novel community refining phase via minimizing the conductance of each community, through which the quality of identified communities is largely improved in linear time. Experimental results in synthetic networks show that the proposed seeding method outperforms other seeding methods in the state of the art and the proposed refining method largely enhances the quality of the identified communities. Experimental results in real graphs with ground-truth communities show that the proposed approach outperforms other state of the art overlapping community detection algorithms, in particular, it is more than two orders of magnitude faster than the existing global algorithms with higher quality, and it obtains much more accurate community structure than the current local algorithms without any priori information.
On Edge Exchangeable Random Graphs
NASA Astrophysics Data System (ADS)
Janson, Svante
2017-06-01
We study a recent model for edge exchangeable random graphs introduced by Crane and Dempsey; in particular we study asymptotic properties of the random simple graph obtained by merging multiple edges. We study a number of examples, and show that the model can produce dense, sparse and extremely sparse random graphs. One example yields a power-law degree distribution. We give some examples where the random graph is dense and converges a.s. in the sense of graph limit theory, but also an example where a.s. every graph limit is the limit of some subsequence. Another example is sparse and yields convergence to a non-integrable generalized graphon defined on (0,∞).
Empirical Bayes conditional independence graphs for regulatory network recovery.
Mahdi, Rami; Madduri, Abishek S; Wang, Guoqing; Strulovici-Barel, Yael; Salit, Jacqueline; Hackett, Neil R; Crystal, Ronald G; Mezey, Jason G
2012-08-01
Computational inference methods that make use of graphical models to extract regulatory networks from gene expression data can have difficulty reconstructing dense regions of a network, a consequence of both computational complexity and unreliable parameter estimation when sample size is small. As a result, identification of hub genes is of special difficulty for these methods. We present a new algorithm, Empirical Light Mutual Min (ELMM), for large network reconstruction that has properties well suited for recovery of graphs with high-degree nodes. ELMM reconstructs the undirected graph of a regulatory network using empirical Bayes conditional independence testing with a heuristic relaxation of independence constraints in dense areas of the graph. This relaxation allows only one gene of a pair with a putative relation to be aware of the network connection, an approach that is aimed at easing multiple testing problems associated with recovering densely connected structures. Using in silico data, we show that ELMM has better performance than commonly used network inference algorithms including GeneNet, ARACNE, FOCI, GENIE3 and GLASSO. We also apply ELMM to reconstruct a network among 5492 genes expressed in human lung airway epithelium of healthy non-smokers, healthy smokers and individuals with chronic obstructive pulmonary disease assayed using microarrays. The analysis identifies dense sub-networks that are consistent with known regulatory relationships in the lung airway and also suggests novel hub regulatory relationships among a number of genes that play roles in oxidative stress and secretion. Software for running ELMM is made available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx. ramimahdi@yahoo.com or jgm45@cornell.edu Supplementary data are available at Bioinformatics online.
Locating sources within a dense sensor array using graph clustering
NASA Astrophysics Data System (ADS)
Gerstoft, P.; Riahi, N.
2017-12-01
We develop a model-free technique to identify weak sources within dense sensor arrays using graph clustering. No knowledge about the propagation medium is needed except that signal strengths decay to insignificant levels within a scale that is shorter than the aperture. We then reinterpret the spatial coherence matrix of a wave field as a matrix whose support is a connectivity matrix of a graph with sensors as vertices. In a dense network, well-separated sources induce clusters in this graph. The geographic spread of these clusters can serve to localize the sources. The support of the covariance matrix is estimated from limited-time data using a hypothesis test with a robust phase-only coherence test statistic combined with a physical distance criterion. The latter criterion ensures graph sparsity and thus prevents clusters from forming by chance. We verify the approach and quantify its reliability on a simulated dataset. The method is then applied to data from a dense 5200 element geophone array that blanketed of the city of Long Beach (CA). The analysis exposes a helicopter traversing the array and oil production facilities.
An automated method for finding molecular complexes in large protein interaction networks
Bader, Gary D; Hogue, Christopher WV
2003-01-01
Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. Results This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Conclusion Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from . PMID:12525261
Information jet: Handling noisy big data from weakly disconnected network
NASA Astrophysics Data System (ADS)
Aurongzeb, Deeder
Sudden aggregation (information jet) of large amount of data is ubiquitous around connected social networks, driven by sudden interacting and non-interacting events, network security threat attacks, online sales channel etc. Clustering of information jet based on time series analysis and graph theory is not new but little work is done to connect them with particle jet statistics. We show pre-clustering based on context can element soft network or network of information which is critical to minimize time to calculate results from noisy big data. We show difference between, stochastic gradient boosting and time series-graph clustering. For disconnected higher dimensional information jet, we use Kallenberg representation theorem (Kallenberg, 2005, arXiv:1401.1137) to identify and eliminate jet similarities from dense or sparse graph.
Li, Xiaojin; Hu, Xintao; Jin, Changfeng; Han, Junwei; Liu, Tianming; Guo, Lei; Hao, Wei; Li, Lingjiang
2013-01-01
Previous studies have investigated both structural and functional brain networks via graph-theoretical methods. However, there is an important issue that has not been adequately discussed before: what is the optimal theoretical graph model for describing the structural networks of human brain? In this paper, we perform a comparative study to address this problem. Firstly, large-scale cortical regions of interest (ROIs) are localized by recently developed and validated brain reference system named Dense Individualized Common Connectivity-based Cortical Landmarks (DICCCOL) to address the limitations in the identification of the brain network ROIs in previous studies. Then, we construct structural brain networks based on diffusion tensor imaging (DTI) data. Afterwards, the global and local graph properties of the constructed structural brain networks are measured using the state-of-the-art graph analysis algorithms and tools and are further compared with seven popular theoretical graph models. In addition, we compare the topological properties between two graph models, namely, stickiness-index-based model (STICKY) and scale-free gene duplication model (SF-GD), that have higher similarity with the real structural brain networks in terms of global and local graph properties. Our experimental results suggest that among the seven theoretical graph models compared in this study, STICKY and SF-GD models have better performances in characterizing the structural human brain network.
Integration of prior knowledge into dense image matching for video surveillance
NASA Astrophysics Data System (ADS)
Menze, M.; Heipke, C.
2014-08-01
Three-dimensional information from dense image matching is a valuable input for a broad range of vision applications. While reliable approaches exist for dedicated stereo setups they do not easily generalize to more challenging camera configurations. In the context of video surveillance the typically large spatial extent of the region of interest and repetitive structures in the scene render the application of dense image matching a challenging task. In this paper we present an approach that derives strong prior knowledge from a planar approximation of the scene. This information is integrated into a graph-cut based image matching framework that treats the assignment of optimal disparity values as a labelling task. Introducing the planar prior heavily reduces ambiguities together with the search space and increases computational efficiency. The results provide a proof of concept of the proposed approach. It allows the reconstruction of dense point clouds in more general surveillance camera setups with wider stereo baselines.
Saund, Eric
2013-10-01
Effective object and scene classification and indexing depend on extraction of informative image features. This paper shows how large families of complex image features in the form of subgraphs can be built out of simpler ones through construction of a graph lattice—a hierarchy of related subgraphs linked in a lattice. Robustness is achieved by matching many overlapping and redundant subgraphs, which allows the use of inexpensive exact graph matching, instead of relying on expensive error-tolerant graph matching to a minimal set of ideal model graphs. Efficiency in exact matching is gained by exploitation of the graph lattice data structure. Additionally, the graph lattice enables methods for adaptively growing a feature space of subgraphs tailored to observed data. We develop the approach in the domain of rectilinear line art, specifically for the practical problem of document forms recognition. We are especially interested in methods that require only one or very few labeled training examples per category. We demonstrate two approaches to using the subgraph features for this purpose. Using a bag-of-words feature vector we achieve essentially single-instance learning on a benchmark forms database, following an unsupervised clustering stage. Further performance gains are achieved on a more difficult dataset using a feature voting method and feature selection procedure.
TreePlus: interactive exploration of networks with enhanced tree layouts.
Lee, Bongshin; Parr, Cynthia S; Plaisant, Catherine; Bederson, Benjamin B; Veksler, Vladislav D; Gray, Wayne D; Kotfila, Christopher
2006-01-01
Despite extensive research, it is still difficult to produce effective interactive layouts for large graphs. Dense layout and occlusion make food webs, ontologies, and social networks difficult to understand and interact with. We propose a new interactive Visual Analytics component called TreePlus that is based on a tree-style layout. TreePlus reveals the missing graph structure with visualization and interaction while maintaining good readability. To support exploration of the local structure of the graph and gathering of information from the extensive reading of labels, we use a guiding metaphor of "Plant a seed and watch it grow." It allows users to start with a node and expand the graph as needed, which complements the classic overview techniques that can be effective at (but often limited to) revealing clusters. We describe our design goals, describe the interface, and report on a controlled user study with 28 participants comparing TreePlus with a traditional graph interface for six tasks. In general, the advantage of TreePlus over the traditional interface increased as the density of the displayed data increased. Participants also reported higher levels of confidence in their answers with TreePlus and most of them preferred TreePlus.
Diaconis, Persi; Holmes, Susan; Janson, Svante
2015-01-01
We work out a graph limit theory for dense interval graphs. The theory developed departs from the usual description of a graph limit as a symmetric function W (x, y) on the unit square, with x and y uniform on the interval (0, 1). Instead, we fix a W and change the underlying distribution of the coordinates x and y. We find choices such that our limits are continuous. Connections to random interval graphs are given, including some examples. We also show a continuity result for the chromatic number and clique number of interval graphs. Some results on uniqueness of the limit description are given for general graph limits. PMID:26405368
Scalable Static and Dynamic Community Detection Using Grappolo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman
Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less
Edge compression techniques for visualization of dense directed graphs.
Dwyer, Tim; Henry Riche, Nathalie; Marriott, Kim; Mears, Christopher
2013-12-01
We explore the effectiveness of visualizing dense directed graphs by replacing individual edges with edges connected to 'modules'-or groups of nodes-such that the new edges imply aggregate connectivity. We only consider techniques that offer a lossless compression: that is, where the entire graph can still be read from the compressed version. The techniques considered are: a simple grouping of nodes with identical neighbor sets; Modular Decomposition which permits internal structure in modules and allows them to be nested; and Power Graph Analysis which further allows edges to cross module boundaries. These techniques all have the same goal--to compress the set of edges that need to be rendered to fully convey connectivity--but each successive relaxation of the module definition permits fewer edges to be drawn in the rendered graph. Each successive technique also, we hypothesize, requires a higher degree of mental effort to interpret. We test this hypothetical trade-off with two studies involving human participants. For Power Graph Analysis we propose a novel optimal technique based on constraint programming. This enables us to explore the parameter space for the technique more precisely than could be achieved with a heuristic. Although applicable to many domains, we are motivated by--and discuss in particular--the application to software dependency analysis.
Community structure and scale-free collections of Erdős-Rényi graphs.
Seshadhri, C; Kolda, Tamara G; Pinar, Ali
2012-05-01
Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such community must contain a dense Erdős-Rényi (ER) subgraph. Based on mathematical arguments, we hypothesize that any graph with a heavy-tailed degree distribution and community structure must contain a scale-free collection of dense ER subgraphs. These theoretical observations corroborate well with empirical evidence. From this, we propose the Block Two-Level Erdős-Rényi (BTER) model, and demonstrate that it accurately captures the observable properties of many real-world social networks.
Threshold-based epidemic dynamics in systems with memory
NASA Astrophysics Data System (ADS)
Bodych, Marcin; Ganguly, Niloy; Krueger, Tyll; Mukherjee, Animesh; Siegmund-Schultze, Rainer; Sikdar, Sandipan
2016-11-01
In this article we analyze an epidemic dynamics model (SI) where we assume that there are k susceptible states, that is a node would require multiple (k) contacts before it gets infected. In specific, we provide a theoretical framework for studying diffusion rate in complete graphs and d-regular trees with extensions to dense random graphs. We observe that irrespective of the topology, the diffusion process could be divided into two distinct phases: i) the initial phase, where the diffusion process is slow, followed by ii) the residual phase where the diffusion rate increases manifold. In fact, the initial phase acts as an indicator for the total diffusion time in dense graphs. The most remarkable lesson from this investigation is that such a diffusion process could be controlled and even contained if acted upon within its initial phase.
Zhou, Jian; Wang, Lusheng; Wang, Weidong; Zhou, Qingfeng
2017-01-01
In future scenarios of heterogeneous and dense networks, randomly-deployed small star networks (SSNs) become a key paradigm, whose system performance is restricted to inter-SSN interference and requires an efficient resource allocation scheme for interference coordination. Traditional resource allocation schemes do not specifically focus on this paradigm and are usually too time consuming in dense networks. In this article, a very efficient graph-based scheme is proposed, which applies the maximal independent set (MIS) concept in graph theory to help divide SSNs into almost interference-free groups. We first construct an interference graph for the system based on a derived distance threshold indicating for any pair of SSNs whether there is intolerable inter-SSN interference or not. Then, SSNs are divided into MISs, and the same resource can be repetitively used by all the SSNs in each MIS. Empirical parameters and equations are set in the scheme to guarantee high performance. Finally, extensive scenarios both dense and nondense are randomly generated and simulated to demonstrate the performance of our scheme, indicating that it outperforms the classical max K-cut-based scheme in terms of system capacity, utility and especially time cost. Its achieved system capacity, utility and fairness can be close to the near-optimal strategy obtained by a time-consuming simulated annealing search. PMID:29113109
HodDB: Design and Analysis of a Query Processor for Brick.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fierro, Gabriel; Culler, David
Brick is a recently proposed metadata schema and ontology for describing building components and the relationships between them. It represents buildings as directed labeled graphs using the RDF data model. Using the SPARQL query language, building-agnostic applications query a Brick graph to discover the set of resources and relationships they require to operate. Latency-sensitive applications, such as user interfaces, demand response and modelpredictive control, require fast queries — conventionally less than 100ms. We benchmark a set of popular open-source and commercial SPARQL databases against three real Brick models using seven application queries and find that none of them meet thismore » performance target. This lack of performance can be attributed to design decisions that optimize for queries over large graphs consisting of billions of triples, but give poor spatial locality and join performance on the small dense graphs typical of Brick. We present the design and evaluation of HodDB, a RDF/SPARQL database for Brick built over a node-based index structure. HodDB performs Brick queries 3-700x faster than leading SPARQL databases and consistently meets the 100ms threshold, enabling the portability of important latency-sensitive building applications.« less
Identifying the minor set cover of dense connected bipartite graphs via random matching edge sets
NASA Astrophysics Data System (ADS)
Hamilton, Kathleen E.; Humble, Travis S.
2017-04-01
Using quantum annealing to solve an optimization problem requires minor embedding a logic graph into a known hardware graph. In an effort to reduce the complexity of the minor embedding problem, we introduce the minor set cover (MSC) of a known graph G: a subset of graph minors which contain any remaining minor of the graph as a subgraph. Any graph that can be embedded into G will be embeddable into a member of the MSC. Focusing on embedding into the hardware graph of commercially available quantum annealers, we establish the MSC for a particular known virtual hardware, which is a complete bipartite graph. We show that the complete bipartite graph K_{N,N} has a MSC of N minors, from which K_{N+1} is identified as the largest clique minor of K_{N,N}. The case of determining the largest clique minor of hardware with faults is briefly discussed but remains an open question.
Identifying the minor set cover of dense connected bipartite graphs via random matching edge sets
Hamilton, Kathleen E.; Humble, Travis S.
2017-02-23
Using quantum annealing to solve an optimization problem requires minor embedding a logic graph into a known hardware graph. We introduce the minor set cover (MSC) of a known graph GG : a subset of graph minors which contain any remaining minor of the graph as a subgraph, in an effort to reduce the complexity of the minor embedding problem. Any graph that can be embedded into GG will be embeddable into a member of the MSC. Focusing on embedding into the hardware graph of commercially available quantum annealers, we establish the MSC for a particular known virtual hardware, whichmore » is a complete bipartite graph. Furthermore, we show that the complete bipartite graph K N,N has a MSC of N minors, from which K N+1 is identified as the largest clique minor of K N,N. In the case of determining the largest clique minor of hardware with faults we briefly discussed this open question.« less
Model validation of simple-graph representations of metabolism
Holme, Petter
2009-01-01
The large-scale properties of chemical reaction systems, such as metabolism, can be studied with graph-based methods. To do this, one needs to reduce the information, lists of chemical reactions, available in databases. Even for the simplest type of graph representation, this reduction can be done in several ways. We investigate different simple network representations by testing how well they encode information about one biologically important network structure—network modularity (the propensity for edges to be clustered into dense groups that are sparsely connected between each other). To achieve this goal, we design a model of reaction systems where network modularity can be controlled and measure how well the reduction to simple graphs captures the modular structure of the model reaction system. We find that the network types that best capture the modular structure of the reaction system are substrate–product networks (where substrates are linked to products of a reaction) and substance networks (with edges between all substances participating in a reaction). Furthermore, we argue that the proposed model for reaction systems with tunable clustering is a general framework for studies of how reaction systems are affected by modularity. To this end, we investigate statistical properties of the model and find, among other things, that it recreates correlations between degree and mass of the molecules. PMID:19158012
Improved Estimation and Interpretation of Correlations in Neural Circuits
Yatsenko, Dimitri; Josić, Krešimir; Ecker, Alexander S.; Froudarakis, Emmanouil; Cotton, R. James; Tolias, Andreas S.
2015-01-01
Ambitious projects aim to record the activity of ever larger and denser neuronal populations in vivo. Correlations in neural activity measured in such recordings can reveal important aspects of neural circuit organization. However, estimating and interpreting large correlation matrices is statistically challenging. Estimation can be improved by regularization, i.e. by imposing a structure on the estimate. The amount of improvement depends on how closely the assumed structure represents dependencies in the data. Therefore, the selection of the most efficient correlation matrix estimator for a given neural circuit must be determined empirically. Importantly, the identity and structure of the most efficient estimator informs about the types of dominant dependencies governing the system. We sought statistically efficient estimators of neural correlation matrices in recordings from large, dense groups of cortical neurons. Using fast 3D random-access laser scanning microscopy of calcium signals, we recorded the activity of nearly every neuron in volumes 200 μm wide and 100 μm deep (150–350 cells) in mouse visual cortex. We hypothesized that in these densely sampled recordings, the correlation matrix should be best modeled as the combination of a sparse graph of pairwise partial correlations representing local interactions and a low-rank component representing common fluctuations and external inputs. Indeed, in cross-validation tests, the covariance matrix estimator with this structure consistently outperformed other regularized estimators. The sparse component of the estimate defined a graph of interactions. These interactions reflected the physical distances and orientation tuning properties of cells: The density of positive ‘excitatory’ interactions decreased rapidly with geometric distances and with differences in orientation preference whereas negative ‘inhibitory’ interactions were less selective. Because of its superior performance, this ‘sparse+latent’ estimator likely provides a more physiologically relevant representation of the functional connectivity in densely sampled recordings than the sample correlation matrix. PMID:25826696
GraphCrunch 2: Software tool for network modeling, alignment and clustering.
Kuchaiev, Oleksii; Stevanović, Aleksandar; Hayes, Wayne; Pržulj, Nataša
2011-01-19
Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI) data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype. We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL") for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other existing tool. Finally, GraphCruch 2 implements an algorithm for clustering nodes within a network based solely on their topological similarities. Using GraphCrunch 2, we demonstrate that eukaryotic and viral PPI networks may belong to different graph model families and show that topology-based clustering can reveal important functional similarities between proteins within yeast and human PPI networks. GraphCrunch 2 is a software tool that implements the latest research on biological network analysis. It parallelizes computationally intensive tasks to fully utilize the potential of modern multi-core CPUs. It is open-source and freely available for research use. It runs under the Windows and Linux platforms.
Enhancing Community Detection By Affinity-based Edge Weighting Scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Andy; Sanders, Geoffrey; Henson, Van
Community detection refers to an important graph analytics problem of finding a set of densely-connected subgraphs in a graph and has gained a great deal of interest recently. The performance of current community detection algorithms is limited by an inherent constraint of unweighted graphs that offer very little information on their internal community structures. In this paper, we propose a new scheme to address this issue that weights the edges in a given graph based on recently proposed vertex affinity. The vertex affinity quantifies the proximity between two vertices in terms of their clustering strength, and therefore, it is idealmore » for graph analytics applications such as community detection. We also demonstrate that the affinity-based edge weighting scheme can improve the performance of community detection algorithms significantly.« less
Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Rosenthal, Nadia A; Kitano, Hiroaki; Boyd, Sarah E
2015-05-01
Existing de novo software platforms have largely overlooked a valuable resource, the expertise of the intended biologist users. Typical data representations such as long gene lists, or highly dense and overlapping transcription factor networks often hinder biologists from relating these results to their expertise. VISIONET, a streamlined visualisation tool built from experimental needs, enables biologists to transform large and dense overlapping transcription factor networks into sparse human-readable graphs via numerically filtering. The VISIONET interface allows users without a computing background to interactively explore and filter their data, and empowers them to apply their specialist knowledge on far more complex and substantial data sets than is currently possible. Applying VISIONET to the Tbx20-Gata4 transcription factor network led to the discovery and validation of Aldh1a2, an essential developmental gene associated with various important cardiac disorders, as a healthy adult cardiac fibroblast gene co-regulated by cardiogenic transcription factors Gata4 and Tbx20. We demonstrate with experimental validations the utility of VISIONET for expertise-driven gene discovery that opens new experimental directions that would not otherwise have been identified.
A Framework for Real-Time Collection, Analysis, and Classification of Ubiquitous Infrasound Data
NASA Astrophysics Data System (ADS)
Christe, A.; Garces, M. A.; Magana-Zook, S. A.; Schnurr, J. M.
2015-12-01
Traditional infrasound arrays are generally expensive to install and maintain. There are ~10^3 infrasound channels on Earth today. The amount of data currently provided by legacy architectures can be processed on a modest server. However, the growing availability of low-cost, ubiquitous, and dense infrasonic sensor networks presents a substantial increase in the volume, velocity, and variety of data flow. Initial data from a prototype ubiquitous global infrasound network is already pushing the boundaries of traditional research server and communication systems, in particular when serving data products over heterogeneous, international network topologies. We present a scalable, cloud-based approach for capturing and analyzing large amounts of dense infrasonic data (>10^6 channels). We utilize Akka actors with WebSockets to maintain data connections with infrasound sensors. Apache Spark provides streaming, batch, machine learning, and graph processing libraries which will permit signature classification, cross-correlation, and other analytics in near real time. This new framework and approach provide significant advantages in scalability and cost.
NASA Astrophysics Data System (ADS)
Gilani, S. A. N.; Awrangjeb, M.; Lu, G.
2015-03-01
Building detection in complex scenes is a non-trivial exercise due to building shape variability, irregular terrain, shadows, and occlusion by highly dense vegetation. In this research, we present a graph based algorithm, which combines multispectral imagery and airborne LiDAR information to completely delineate the building boundaries in urban and densely vegetated area. In the first phase, LiDAR data is divided into two groups: ground and non-ground data, using ground height from a bare-earth DEM. A mask, known as the primary building mask, is generated from the non-ground LiDAR points where the black region represents the elevated area (buildings and trees), while the white region describes the ground (earth). The second phase begins with the process of Connected Component Analysis (CCA) where the number of objects present in the test scene are identified followed by initial boundary detection and labelling. Additionally, a graph from the connected components is generated, where each black pixel corresponds to a node. An edge of a unit distance is defined between a black pixel and a neighbouring black pixel, if any. An edge does not exist from a black pixel to a neighbouring white pixel, if any. This phenomenon produces a disconnected components graph, where each component represents a prospective building or a dense vegetation (a contiguous block of black pixels from the primary mask). In the third phase, a clustering process clusters the segmented lines, extracted from multispectral imagery, around the graph components, if possible. In the fourth step, NDVI, image entropy, and LiDAR data are utilised to discriminate between vegetation, buildings, and isolated building's occluded parts. Finally, the initially extracted building boundary is extended pixel-wise using NDVI, entropy, and LiDAR data to completely delineate the building and to maximise the boundary reach towards building edges. The proposed technique is evaluated using two Australian data sets: Aitkenvale and Hervey Bay, for object-based and pixel-based completeness, correctness, and quality. The proposed technique detects buildings larger than 50 m2 and 10 m2 in the Aitkenvale site with 100% and 91% accuracy, respectively, while in the Hervey Bay site it performs better with 100% accuracy for buildings larger than 10 m2 in area.
Effect of magnetic quantization on ion acoustic waves ultra-relativistic dense plasma
NASA Astrophysics Data System (ADS)
Javed, Asif; Rasheed, A.; Jamil, M.; Siddique, M.; Tsintsadze, N. L.
2017-11-01
In this paper, we have studied the influence of magnetic quantization of orbital motion of the electrons on the profile of linear and nonlinear ion-acoustic waves, which are propagating in the ultra-relativistic dense magneto quantum plasmas. We have employed both Thomas Fermi and Quantum Magneto Hydrodynamic models (along with the Poisson equation) of quantum plasmas. To investigate the large amplitude nonlinear structure of the acoustic wave, Sagdeev-Pseudo-Potential approach has been adopted. The numerical analysis of the linear dispersion relation and the nonlinear acoustic waves has been presented by drawing their graphs that highlight the effects of plasma parameters on these waves in both the linear and the nonlinear regimes. It has been noticed that only supersonic ion acoustic solitary waves can be excited in the above mentioned quantum plasma even when the value of the critical Mach number is less than unity. Both width and depth of Sagdeev potential reduces on increasing the magnetic quantization parameter η. Whereas the amplitude of the ion acoustic soliton reduces on increasing η, its width appears to be directly proportional to η. The present work would be helpful to understand the excitation of nonlinear ion-acoustic waves in the dense astrophysical environments such as magnetars and in intense-laser plasma interactions.
SING: Subgraph search In Non-homogeneous Graphs
2010-01-01
Background Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs. Results In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task. Conclusions Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs. PMID:20170516
Spatial evolutionary public goods game on complete graph and dense complex networks
NASA Astrophysics Data System (ADS)
Kim, Jinho; Chae, Huiseung; Yook, Soon-Hyung; Kim, Yup
2015-03-01
We study the spatial evolutionary public goods game (SEPGG) with voluntary or optional participation on a complete graph (CG) and on dense networks. Based on analyses of the SEPGG rate equation on finite CG, we find that SEPGG has two stable states depending on the value of multiplication factor r, illustrating how the ``tragedy of the commons'' and ``an anomalous state without any active participants'' occurs in real-life situations. When r is low (), the state with only loners is stable, and the state with only defectors is stable when r is high (). We also derive the exact scaling relation for r*. All of the results are confirmed by numerical simulation. Furthermore, we find that a cooperator-dominant state emerges when the number of participants or the mean degree,
Focus-based filtering + clustering technique for power-law networks with small world phenomenon
NASA Astrophysics Data System (ADS)
Boutin, François; Thièvre, Jérôme; Hascoët, Mountaz
2006-01-01
Realistic interaction networks usually present two main properties: a power-law degree distribution and a small world behavior. Few nodes are linked to many nodes and adjacent nodes are likely to share common neighbors. Moreover, graph structure usually presents a dense core that is difficult to explore with classical filtering and clustering techniques. In this paper, we propose a new filtering technique accounting for a user-focus. This technique extracts a tree-like graph with also power-law degree distribution and small world behavior. Resulting structure is easily drawn with classical force-directed drawing algorithms. It is also quickly clustered and displayed into a multi-level silhouette tree (MuSi-Tree) from any user-focus. We built a new graph filtering + clustering + drawing API and report a case study.
Large-scale DCMs for resting-state fMRI.
Razi, Adeel; Seghier, Mohamed L; Zhou, Yuan; McColgan, Peter; Zeidman, Peter; Park, Hae-Jeong; Sporns, Olaf; Rees, Geraint; Friston, Karl J
2017-01-01
This paper considers the identification of large directed graphs for resting-state brain networks based on biophysical models of distributed neuronal activity, that is, effective connectivity . This identification can be contrasted with functional connectivity methods based on symmetric correlations that are ubiquitous in resting-state functional MRI (fMRI). We use spectral dynamic causal modeling (DCM) to invert large graphs comprising dozens of nodes or regions. The ensuing graphs are directed and weighted, hence providing a neurobiologically plausible characterization of connectivity in terms of excitatory and inhibitory coupling. Furthermore, we show that the use of to discover the most likely sparse graph (or model) from a parent (e.g., fully connected) graph eschews the arbitrary thresholding often applied to large symmetric (functional connectivity) graphs. Using empirical fMRI data, we show that spectral DCM furnishes connectivity estimates on large graphs that correlate strongly with the estimates provided by stochastic DCM. Furthermore, we increase the efficiency of model inversion using functional connectivity modes to place prior constraints on effective connectivity. In other words, we use a small number of modes to finesse the potentially redundant parameterization of large DCMs. We show that spectral DCM-with functional connectivity priors-is ideally suited for directed graph theoretic analyses of resting-state fMRI. We envision that directed graphs will prove useful in understanding the psychopathology and pathophysiology of neurodegenerative and neurodevelopmental disorders. We will demonstrate the utility of large directed graphs in clinical populations in subsequent reports, using the procedures described in this paper.
Joint tumor segmentation and dense deformable registration of brain MR images.
Parisot, Sarah; Duffau, Hugues; Chemouny, Stéphane; Paragios, Nikos
2012-01-01
In this paper we propose a novel graph-based concurrent registration and segmentation framework. Registration is modeled with a pairwise graphical model formulation that is modular with respect to the data and regularization term. Segmentation is addressed by adopting a similar graphical model, using image-based classification techniques while producing a smooth solution. The two problems are coupled via a relaxation of the registration criterion in the presence of tumors as well as a segmentation through a registration term aiming the separation between healthy and diseased tissues. Efficient linear programming is used to solve both problems simultaneously. State of the art results demonstrate the potential of our method on a large and challenging low-grade glioma data set.
Loops in hierarchical channel networks
NASA Astrophysics Data System (ADS)
Katifori, Eleni; Magnasco, Marcelo
2012-02-01
Nature provides us with many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture. Although a number of methods have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated and natural graphs extracted from digitized images of dicotyledonous leaves and animal vasculature. We calculate various metrics on the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.
A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
Wei, Wei; Gao, Bin; Liu, Tie-Yan; Wang, Taifeng; Li, Guohui; Li, Hang
2016-04-01
Graph-based ranking has been extensively studied and frequently applied in many applications, such as webpage ranking. It aims at mining potentially valuable information from the raw graph-structured data. Recently, with the proliferation of rich heterogeneous information (e.g., node/edge features and prior knowledge) available in many real-world graphs, how to effectively and efficiently leverage all information to improve the ranking performance becomes a new challenging problem. Previous methods only utilize part of such information and attempt to rank graph nodes according to link-based methods, of which the ranking performances are severely affected by several well-known issues, e.g., over-fitting or high computational complexity, especially when the scale of graph is very large. In this paper, we address the large-scale graph-based ranking problem and focus on how to effectively exploit rich heterogeneous information of the graph to improve the ranking performance. Specifically, we propose an innovative and effective semi-supervised PageRank (SSP) approach to parameterize the derived information within a unified semi-supervised learning framework (SSLF-GR), then simultaneously optimize the parameters and the ranking scores of graph nodes. Experiments on the real-world large-scale graphs demonstrate that our method significantly outperforms the algorithms that consider such graph information only partially.
Dimitriadis, Stavros I.; Salis, Christos; Tarnanas, Ioannis; Linden, David E.
2017-01-01
The human brain is a large-scale system of functionally connected brain regions. This system can be modeled as a network, or graph, by dividing the brain into a set of regions, or “nodes,” and quantifying the strength of the connections between nodes, or “edges,” as the temporal correlation in their patterns of activity. Network analysis, a part of graph theory, provides a set of summary statistics that can be used to describe complex brain networks in a meaningful way. The large-scale organization of the brain has features of complex networks that can be quantified using network measures from graph theory. The adaptation of both bivariate (mutual information) and multivariate (Granger causality) connectivity estimators to quantify the synchronization between multichannel recordings yields a fully connected, weighted, (a)symmetric functional connectivity graph (FCG), representing the associations among all brain areas. The aforementioned procedure leads to an extremely dense network of tens up to a few hundreds of weights. Therefore, this FCG must be filtered out so that the “true” connectivity pattern can emerge. Here, we compared a large number of well-known topological thresholding techniques with the novel proposed data-driven scheme based on orthogonal minimal spanning trees (OMSTs). OMSTs filter brain connectivity networks based on the optimization between the global efficiency of the network and the cost preserving its wiring. We demonstrated the proposed method in a large EEG database (N = 101 subjects) with eyes-open (EO) and eyes-closed (EC) tasks by adopting a time-varying approach with the main goal to extract features that can totally distinguish each subject from the rest of the set. Additionally, the reliability of the proposed scheme was estimated in a second case study of fMRI resting-state activity with multiple scans. Our results demonstrated clearly that the proposed thresholding scheme outperformed a large list of thresholding schemes based on the recognition accuracy of each subject compared to the rest of the cohort (EEG). Additionally, the reliability of the network metrics based on the fMRI static networks was improved based on the proposed topological filtering scheme. Overall, the proposed algorithm could be used across neuroimaging and multimodal studies as a common computationally efficient standardized tool for a great number of neuroscientists and physicists working on numerous of projects. PMID:28491032
GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil
2015-11-15
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Agarwal, Kapil; Song, Shuaiwen
2015-09-30
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the hostmore » and the device.« less
Enabling Graph Appliance for Genome Assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Rina; Graves, Jeffrey A; Lee, Sangkeun
2015-01-01
In recent years, there has been a huge growth in the amount of genomic data available as reads generated from various genome sequencers. The number of reads generated can be huge, ranging from hundreds to billions of nucleotide, each varying in size. Assembling such large amounts of data is one of the challenging computational problems for both biomedical and data scientists. Most of the genome assemblers developed have used de Bruijn graph techniques. A de Bruijn graph represents a collection of read sequences by billions of vertices and edges, which require large amounts of memory and computational power to storemore » and process. This is the major drawback to de Bruijn graph assembly. Massively parallel, multi-threaded, shared memory systems can be leveraged to overcome some of these issues. The objective of our research is to investigate the feasibility and scalability issues of de Bruijn graph assembly on Cray s Urika-GD system; Urika-GD is a high performance graph appliance with a large shared memory and massively multithreaded custom processor designed for executing SPARQL queries over large-scale RDF data sets. However, to the best of our knowledge, there is no research on representing a de Bruijn graph as an RDF graph or finding Eulerian paths in RDF graphs using SPARQL for potential genome discovery. In this paper, we address the issues involved in representing a de Bruin graphs as RDF graphs and propose an iterative querying approach for finding Eulerian paths in large RDF graphs. We evaluate the performance of our implementation on real world ebola genome datasets and illustrate how genome assembly can be accomplished with Urika-GD using iterative SPARQL queries.« less
NASA Astrophysics Data System (ADS)
Palla, Gergely; Farkas, Illés J.; Pollner, Péter; Derényi, Imre; Vicsek, Tamás
2007-06-01
A search technique locating network modules, i.e. internally densely connected groups of nodes in directed networks is introduced by extending the clique percolation method originally proposed for undirected networks. After giving a suitable definition for directed modules we investigate their percolation transition in the Erdos-Rényi graph both analytically and numerically. We also analyse four real-world directed networks, including Google's own web-pages, an email network, a word association graph and the transcriptional regulatory network of the yeast Saccharomyces cerevisiae. The obtained directed modules are validated by additional information available for the nodes. We find that directed modules of real-world graphs inherently overlap and the investigated networks can be classified into two major groups in terms of the overlaps between the modules. Accordingly, in the word-association network and Google's web-pages, overlaps are likely to contain in-hubs, whereas the modules in the email and transcriptional regulatory network tend to overlap via out-hubs.
Gambler's ruin problem on Erdős-Rényi graphs
NASA Astrophysics Data System (ADS)
Néda, Zoltán; Davidova, Larissa; Újvári, Szeréna; Istrate, Gabriel
2017-02-01
A multiagent ruin-game is studied on Erdős-Rényi type graphs. Initially the players have the same wealth. At each time step a monopolist game is played on all active links (links that connect nodes with nonzero wealth). In such a game each player puts a unit wealth in the pot and the pot is won with equal probability by one of the players. The game ends when there are no connected players such that both of them have non-zero wealth. In order to characterize the final state for dense graphs a compact formula is given for the expected number of the remaining players with non-zero wealth and the wealth distribution among these players. Theoretical predictions are given for the expected duration of the ruin game. The dynamics of the number of active players is also investigated. Validity of the theoretical predictions is investigated by Monte Carlo experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grossman, Max; Pritchard Jr., Howard Porter; Budimlic, Zoran
2016-12-22
Graph500 [14] is an effort to offer a standardized benchmark across large-scale distributed platforms which captures the behavior of common communicationbound graph algorithms. Graph500 differs from other large-scale benchmarking efforts (such as HPL [6] or HPGMG [7]) primarily in the irregularity of its computation and data access patterns. The core computational kernel of Graph500 is a breadth-first search (BFS) implemented on an undirected graph. The output of Graph500 is a spanning tree of the input graph, usually represented by a predecessor mapping for every node in the graph. The Graph500 benchmark defines several pre-defined input sizes for implementers to testmore » against. This report summarizes investigation into implementing the Graph500 benchmark on OpenSHMEM, and focuses on first building a strong and practical understanding of the strengths and limitations of past work before proposing and developing novel extensions.« less
NASA Astrophysics Data System (ADS)
Debnath, Lokenath
2010-09-01
This article is essentially devoted to a brief historical introduction to Euler's formula for polyhedra, topology, theory of graphs and networks with many examples from the real-world. Celebrated Königsberg seven-bridge problem and some of the basic properties of graphs and networks for some understanding of the macroscopic behaviour of real physical systems are included. We also mention some important and modern applications of graph theory or network problems from transportation to telecommunications. Graphs or networks are effectively used as powerful tools in industrial, electrical and civil engineering, communication networks in the planning of business and industry. Graph theory and combinatorics can be used to understand the changes that occur in many large and complex scientific, technical and medical systems. With the advent of fast large computers and the ubiquitous Internet consisting of a very large network of computers, large-scale complex optimization problems can be modelled in terms of graphs or networks and then solved by algorithms available in graph theory. Many large and more complex combinatorial problems dealing with the possible arrangements of situations of various kinds, and computing the number and properties of such arrangements can be formulated in terms of networks. The Knight's tour problem, Hamilton's tour problem, problem of magic squares, the Euler Graeco-Latin squares problem and their modern developments in the twentieth century are also included.
Building Change Detection from Bi-Temporal Dense-Matching Point Clouds and Aerial Images.
Pang, Shiyan; Hu, Xiangyun; Cai, Zhongliang; Gong, Jinqi; Zhang, Mi
2018-03-24
In this work, a novel building change detection method from bi-temporal dense-matching point clouds and aerial images is proposed to address two major problems, namely, the robust acquisition of the changed objects above ground and the automatic classification of changed objects into buildings or non-buildings. For the acquisition of changed objects above ground, the change detection problem is converted into a binary classification, in which the changed area above ground is regarded as the foreground and the other area as the background. For the gridded points of each period, the graph cuts algorithm is adopted to classify the points into foreground and background, followed by the region-growing algorithm to form candidate changed building objects. A novel structural feature that was extracted from aerial images is constructed to classify the candidate changed building objects into buildings and non-buildings. The changed building objects are further classified as "newly built", "taller", "demolished", and "lower" by combining the classification and the digital surface models of two periods. Finally, three typical areas from a large dataset are used to validate the proposed method. Numerous experiments demonstrate the effectiveness of the proposed algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Sterck, H
2011-10-18
The following work has been performed by PI Hans De Sterck and graduate student Manda Winlaw for the required tasks 1-5 (as listed in the Statement of Work). Graduate student Manda Winlaw has visited LLNL January 31-March 11, 2011 and May 23-August 19, 2010, working with Van Henson and Mike O'Hara on non-negative matrix factorizations (NMF). She has investigated the dense subgraph clustering algorithm from 'Finding Dense Subgraphs for Sparse Undirected, Directed, and Bipartite Graphs' by Chen and Saad, testing this method on several term-document matrices and adapting it to cluster based on the rank of the subgraphs instead ofmore » the density. Manda Winlaw was awarded a first prize in the annual LLNL summer student poster competition for a poster on her NMF research. PI Hans De Sterck has developed a new adaptive algebraic multigrid algorithm for computing a few dominant or minimal singular triplets of sparse rectangular matrices. This work builds on adaptive algebraic multigrid methods that were further developed by the PI and collaborators (including Sanders and Henson) for Markov chains. The method also combines and extends existing multigrid algorithms for the symmetric eigenproblem. The PI has visited LLNL February 22-25, 2011, and has given a CASC seminar 'Algebraic Multigrid for the Singular Value Problem' on this work on February 23, 2011. During his visit, he has discussed this work and related topics with Van Henson, Geoffrey Sanders, Panayot Vassilevski, and others. He has tested the algorithm on PDE matrices and on a term-document matrix, with promising initial results. Manda Winlaw has also started to work, with O'Hara, on estimating probability distributions over undirected graph edges. The goal is to estimate probabilistic models from sets of undirected graph edges for the purpose of prediction, anomaly detection and support to supervised learning. Graduate student Manda Winlaw is writing a paper on the results obtained with O'Hara which will be submitted some time later in 2011 to a data mining conference. PI Hans De Sterck has developed a new optimization algorithm for canonical tensor approximation, formulating an extension of the nonlinear GMRES method to optimization problems. Numerical results for tensors with up to 8 modes show that this new method is efficient for sparse and dense tensors. He has written a paper on this which has been submitted to the SIAM Journal on Scientific Computing. PI Hans De Sterck has further developed his new optimization algorithm for canonical tensor approximation, formulating an extension in terms of steepest-descent preconditioning, which makes the approach generally applicable for nonlinear optimization. He has written a paper on this extension which has been submitted to Numerical Linear Algebra with Applications.« less
Quantifying loopy network architectures.
Katifori, Eleni; Magnasco, Marcelo O
2012-01-01
Biology presents many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture containing closed loops at many different levels. Although a number of approaches have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework, the hierarchical loop decomposition, that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated graphs, such as artificial models and optimal distribution networks, as well as natural graphs extracted from digitized images of dicotyledonous leaves and vasculature of rat cerebral neocortex. We calculate various metrics based on the asymmetry, the cumulative size distribution and the Strahler bifurcation ratios of the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information (exact location of edges and nodes) from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.
Local Table Condensation in Rough Set Approach for Jumping Emerging Pattern Induction
NASA Astrophysics Data System (ADS)
Terlecki, Pawel; Walczak, Krzysztof
This paper extends the rough set approach for JEP induction based on the notion of a condensed decision table. The original transaction database is transformed to a relational form and patterns are induced by means of local reducts. The transformation employs an item aggregation obtained by coloring a graph that re0ects con0icts among items. For e±ciency reasons we propose to perform this preprocessing locally, i.e. at the transaction level, to achieve a higher dimensionality gain. Special maintenance strategy is also used to avoid graph rebuilds. Both global and local approach have been tested and discussed for dense and synthetically generated sparse datasets.
Aćimović, Jugoslava; Mäki-Marttunen, Tuomo; Linne, Marja-Leena
2015-01-01
We developed a two-level statistical model that addresses the question of how properties of neurite morphology shape the large-scale network connectivity. We adopted a low-dimensional statistical description of neurites. From the neurite model description we derived the expected number of synapses, node degree, and the effective radius, the maximal distance between two neurons expected to form at least one synapse. We related these quantities to the network connectivity described using standard measures from graph theory, such as motif counts, clustering coefficient, minimal path length, and small-world coefficient. These measures are used in a neuroscience context to study phenomena from synaptic connectivity in the small neuronal networks to large scale functional connectivity in the cortex. For these measures we provide analytical solutions that clearly relate different model properties. Neurites that sparsely cover space lead to a small effective radius. If the effective radius is small compared to the overall neuron size the obtained networks share similarities with the uniform random networks as each neuron connects to a small number of distant neurons. Large neurites with densely packed branches lead to a large effective radius. If this effective radius is large compared to the neuron size, the obtained networks have many local connections. In between these extremes, the networks maximize the variability of connection repertoires. The presented approach connects the properties of neuron morphology with large scale network properties without requiring heavy simulations with many model parameters. The two-steps procedure provides an easier interpretation of the role of each modeled parameter. The model is flexible and each of its components can be further expanded. We identified a range of model parameters that maximizes variability in network connectivity, the property that might affect network capacity to exhibit different dynamical regimes.
A Semantic Graph Query Language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaplan, I L
2006-10-16
Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.
An Automated Method to Identify Mesoscale Convective Complexes (MCCs) Implementing Graph Theory
NASA Astrophysics Data System (ADS)
Whitehall, K. D.; Mattmann, C. A.; Jenkins, G. S.; Waliser, D. E.; Rwebangira, R.; Demoz, B.; Kim, J.; Goodale, C. E.; Hart, A. F.; Ramirez, P.; Joyce, M. J.; Loikith, P.; Lee, H.; Khudikyan, S.; Boustani, M.; Goodman, A.; Zimdars, P. A.; Whittell, J.
2013-12-01
Mesoscale convective complexes (MCCs) are convectively-driven weather systems with a duration of ~10 - 12 hours and contributions of large amounts to the rainfall daily and monthly totals. More than 400 MCCs occur annually over various locations on the globe. In West Africa, ~170 MCCs occur annually during the 180 days representing the summer months (June - November), and contribute ~75% of the annual wet season rainfall. The main objective of this study is to improve automatic identification of MCC over West Africa. The spatial expanse of MCCs and the spatio-temporal variability in their convective characteristics make them difficult to characterize even in dense networks of radars and/or surface gauges. As such there exist criteria for identifying MCCs with satellite images - mostly using infrared (IR) data. Automated MCC identification methods are based on forward and/or backward in time spatial-temporal analysis of the IR satellite data and characteristically incorporate a manual component as these algorithms routinely falter with merging and splitting cloud systems between satellite images. However, these algorithms are not readily transferable to voluminous data or other satellite-derived datasets (e.g. TRMM), thus hindering comprehensive studies of these features both at weather and climate timescales. Recognizing the existing limitations of automated methods, this study explores the applicability of graph theory to creating a fully automated method for deriving a West African MCC dataset from hourly infrared satellite images between 2001- 2012. Graph theory, though not heavily implemented in the atmospheric sciences, has been used for the predicting (nowcasting) of thunderstorms from radar and satellite data by considering the relationship between atmospheric variables at a given time, or for the spatial-temporal analysis of cloud volumes. From these few studies, graph theory appears to be innately applicable to the complexity, non-linearity and inherent chaos of the atmospheric system. Our preliminary results show that the use of graph theory improves data management thus allowing for longer periods to be studied, and creates a transferable method that allows for other data to be utilized.
Chaotic Traversal (CHAT): Very Large Graphs Traversal Using Chaotic Dynamics
NASA Astrophysics Data System (ADS)
Changaival, Boonyarit; Rosalie, Martin; Danoy, Grégoire; Lavangnananda, Kittichai; Bouvry, Pascal
2017-12-01
Graph Traversal algorithms can find their applications in various fields such as routing problems, natural language processing or even database querying. The exploration can be considered as a first stepping stone into knowledge extraction from the graph which is now a popular topic. Classical solutions such as Breadth First Search (BFS) and Depth First Search (DFS) require huge amounts of memory for exploring very large graphs. In this research, we present a novel memoryless graph traversal algorithm, Chaotic Traversal (CHAT) which integrates chaotic dynamics to traverse large unknown graphs via the Lozi map and the Rössler system. To compare various dynamics effects on our algorithm, we present an original way to perform the exploration of a parameter space using a bifurcation diagram with respect to the topological structure of attractors. The resulting algorithm is an efficient and nonresource demanding algorithm, and is therefore very suitable for partial traversal of very large and/or unknown environment graphs. CHAT performance using Lozi map is proven superior than the, commonly known, Random Walk, in terms of number of nodes visited (coverage percentage) and computation time where the environment is unknown and memory usage is restricted.
G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H
2009-01-01
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.
Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho
2014-01-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho
2014-11-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.
Cong, Yingnan; Chan, Yao-Ban; Phillips, Charles A; Langston, Michael A; Ragan, Mark A
2017-01-01
Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k ) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k . Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k .
Fazeli Dehkordy, Soudabeh; Carlos, Ruth C; Hall, Kelli S; Dalton, Vanessa K
2014-09-01
Millions of people use online search engines everyday to find health-related information and voluntarily share their personal health status and behaviors in various Web sites. Thus, data from tracking of online information seeker's behavior offer potential opportunities for use in public health surveillance and research. Google Trends is a feature of Google which allows Internet users to graph the frequency of searches for a single term or phrase over time or by geographic region. We used Google Trends to describe patterns of information-seeking behavior in the subject of dense breasts and to examine their correlation with the passage or introduction of dense breast notification legislation. To capture the temporal variations of information seeking about dense breasts, the Web search query "dense breast" was entered in the Google Trends tool. We then mapped the dates of legislative actions regarding dense breasts that received widespread coverage in the lay media to information-seeking trends about dense breasts over time. Newsworthy events and legislative actions appear to correlate well with peaks in search volume of "dense breast". Geographic regions with the highest search volumes have passed, denied, or are currently considering the dense breast legislation. Our study demonstrated that any legislative action and respective news coverage correlate with increase in information seeking for "dense breast" on Google, suggesting that Google Trends has the potential to serve as a data source for policy-relevant research. Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Kumbhare, Alok; Wickramaarachchi, Charith
2014-08-25
Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines themore » scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.« less
On size-constrained minimum s–t cut problems and size-constrained dense subgraph problems
Chen, Wenbin; Samatova, Nagiza F.; Stallmann, Matthias F.; ...
2015-10-30
In some application cases, the solutions of combinatorial optimization problems on graphs should satisfy an additional vertex size constraint. In this paper, we consider size-constrained minimum s–t cut problems and size-constrained dense subgraph problems. We introduce the minimum s–t cut with at-least-k vertices problem, the minimum s–t cut with at-most-k vertices problem, and the minimum s–t cut with exactly k vertices problem. We prove that they are NP-complete. Thus, they are not polynomially solvable unless P = NP. On the other hand, we also study the densest at-least-k-subgraph problem (DalkS) and the densest at-most-k-subgraph problem (DamkS) introduced by Andersen andmore » Chellapilla [1]. We present a polynomial time algorithm for DalkS when k is bounded by some constant c. We also present two approximation algorithms for DamkS. In conclusion, the first approximation algorithm for DamkS has an approximation ratio of n-1/k-1, where n is the number of vertices in the input graph. The second approximation algorithm for DamkS has an approximation ratio of O (n δ), for some δ < 1/3.« less
Comparing Internet Probing Methodologies Through an Analysis of Large Dynamic Graphs
2014-06-01
comparable Internet topologies in less time. We compare these by modeling union of traceroute outputs as graphs, and using standard graph theoretical...topologies in less time. We compare these by modeling union of traceroute outputs as graphs, and using standard graph theoretical measurements as well...We compare these by modeling union of traceroute outputs as graphs, and study the graphs by using vertex and edge count, average vertex degree
Gămănuţ, Răzvan; Kennedy, Henry; Toroczkai, Zoltán; Ercsey-Ravasz, Mária; Van Essen, David C; Knoblauch, Kenneth; Burkhalter, Andreas
2018-02-07
The inter-areal wiring pattern of the mouse cerebral cortex was analyzed in relation to a refined parcellation of cortical areas. Twenty-seven retrograde tracer injections were made in 19 areas of a 47-area parcellation of the mouse neocortex. Flat mounts of the cortex and multiple histological markers enabled detailed counts of labeled neurons in individual areas. The observed log-normal distribution of connection weights to each cortical area spans 5 orders of magnitude and reveals a distinct connectivity profile for each area, analogous to that observed in macaques. The cortical network has a density of 97%, considerably higher than the 66% density reported in macaques. A weighted graph analysis reveals a similar global efficiency but weaker spatial clustering compared with that reported in macaques. The consistency, precision of the connectivity profile, density, and weighted graph analysis of the present data differ significantly from those obtained in earlier studies in the mouse. Copyright © 2017 Elsevier Inc. All rights reserved.
Ringo: Interactive Graph Analytics on Big-Memory Machines
Perez, Yonathan; Sosič, Rok; Banerjee, Arijit; Puttagunta, Rohan; Raison, Martin; Shah, Pararth; Leskovec, Jure
2016-01-01
We present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, links). Mining graphs provides valuable insights about individual objects as well as the relationships among them. In building Ringo, we take advantage of the fact that machines with large memory and many cores are widely available and also relatively affordable. This allows us to build an easy-to-use interactive high-performance graph analytics system. Graphs also need to be built from input data, which often resides in the form of relational tables. Thus, Ringo provides rich functionality for manipulating raw input data tables into various kinds of graphs. Furthermore, Ringo also provides over 200 graph analytics functions that can then be applied to constructed graphs. We show that a single big-memory machine provides a very attractive platform for performing analytics on all but the largest graphs as it offers excellent performance and ease of use as compared to alternative approaches. With Ringo, we also demonstrate how to integrate graph analytics with an iterative process of trial-and-error data exploration and rapid experimentation, common in data mining workloads. PMID:27081215
Ringo: Interactive Graph Analytics on Big-Memory Machines.
Perez, Yonathan; Sosič, Rok; Banerjee, Arijit; Puttagunta, Rohan; Raison, Martin; Shah, Pararth; Leskovec, Jure
2015-01-01
We present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, links). Mining graphs provides valuable insights about individual objects as well as the relationships among them. In building Ringo, we take advantage of the fact that machines with large memory and many cores are widely available and also relatively affordable. This allows us to build an easy-to-use interactive high-performance graph analytics system. Graphs also need to be built from input data, which often resides in the form of relational tables. Thus, Ringo provides rich functionality for manipulating raw input data tables into various kinds of graphs. Furthermore, Ringo also provides over 200 graph analytics functions that can then be applied to constructed graphs. We show that a single big-memory machine provides a very attractive platform for performing analytics on all but the largest graphs as it offers excellent performance and ease of use as compared to alternative approaches. With Ringo, we also demonstrate how to integrate graph analytics with an iterative process of trial-and-error data exploration and rapid experimentation, common in data mining workloads.
Intuitive color-based visualization of multimedia content as large graphs
NASA Astrophysics Data System (ADS)
Delest, Maylis; Don, Anthony; Benois-Pineau, Jenny
2004-06-01
Data visualization techniques are penetrating in various technological areas. In the field of multimedia such as information search and retrieval in multimedia archives, or digital media production and post-production, data visualization methodologies based on large graphs give an exciting alternative to conventional storyboard visualization. In this paper we develop a new approach to visualization of multimedia (video) documents based both on large graph clustering and preliminary video segmenting and indexing.
Kwon, Oh-Hyun; Crnovrsanin, Tarik; Ma, Kwan-Liu
2018-01-01
Using different methods for laying out a graph can lead to very different visual appearances, with which the viewer perceives different information. Selecting a "good" layout method is thus important for visualizing a graph. The selection can be highly subjective and dependent on the given task. A common approach to selecting a good layout is to use aesthetic criteria and visual inspection. However, fully calculating various layouts and their associated aesthetic metrics is computationally expensive. In this paper, we present a machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels. For a given graph, our approach can show what the graph would look like in different layouts and estimate their corresponding aesthetic metrics. An important contribution of our work is the development of a new framework to design graph kernels. Our experimental study shows that our estimation calculation is considerably faster than computing the actual layouts and their aesthetic metrics. Also, our graph kernels outperform the state-of-the-art ones in both time and accuracy. In addition, we conducted a user study to demonstrate that the topological similarity computed with our graph kernel matches perceptual similarity assessed by human users.
A graph signal filtering-based approach for detection of different edge types on airborne lidar data
NASA Astrophysics Data System (ADS)
Bayram, Eda; Vural, Elif; Alatan, Aydin
2017-10-01
Airborne Laser Scanning is a well-known remote sensing technology, which provides a dense and highly accurate, yet unorganized point cloud of earth surface. During the last decade, extracting information from the data generated by airborne LiDAR systems has been addressed by many studies in geo-spatial analysis and urban monitoring applications. However, the processing of LiDAR point clouds is challenging due to their irregular structure and 3D geometry. In this study, we propose a novel framework for the detection of the boundaries of an object or scene captured by LiDAR. Our approach is motivated by edge detection techniques in vision research and it is established on graph signal filtering which is an exciting and promising field of signal processing for irregular data types. Due to the convenient applicability of graph signal processing tools on unstructured point clouds, we achieve the detection of the edge points directly on 3D data by using a graph representation that is constructed exclusively to answer the requirements of the application. Moreover, considering the elevation data as the (graph) signal, we leverage aerial characteristic of the airborne LiDAR data. The proposed method can be employed both for discovering the jump edges on a segmentation problem and for exploring the crease edges on a LiDAR object on a reconstruction/modeling problem, by only adjusting the filter characteristics.
2015-09-21
this framework, MIT LL carried out a one-year proof- of-concept study to determine the capabilities and challenges in the detection of anomalies in...extremely large graphs [5]. Under this effort, two real datasets were considered, and algorithms for data modeling and anomaly detection were developed...is required in a well-defined experimental framework for the detection of anomalies in very large graphs. This study is intended to inform future
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kupavskii, A B; Raigorodskii, A M
2013-10-31
We investigate in detail some properties of distance graphs constructed on the integer lattice. Such graphs find wide applications in problems of combinatorial geometry, in particular, such graphs were employed to answer Borsuk's question in the negative and to obtain exponential estimates for the chromatic number of the space. This work is devoted to the study of the number of cliques and the chromatic number of such graphs under certain conditions. Constructions of sequences of distance graphs are given, in which the graphs have unit length edges and contain a large number of triangles that lie on a sphere of radius 1/√3more » (which is the minimum possible). At the same time, the chromatic numbers of the graphs depend exponentially on their dimension. The results of this work strengthen and generalize some of the results obtained in a series of papers devoted to related issues. Bibliography: 29 titles.« less
An Efficient Scheme for Updating Sparse Cholesky Factors
NASA Technical Reports Server (NTRS)
Raghavan, Padma
2002-01-01
Raghavan had earlier developed the software package DCSPACK which can be used for solving sparse linear systems where the coefficient matrix is symmetric and positive definite (this project was not funded by NASA but by agencies such as NSF). DSCPACK-S is the serial code and DSCPACK-P is a parallel implementation suitable for multiprocessors or networks-of-workstations with message passing using MCI. The main algorithm used is the Cholesky factorization of a sparse symmetric positive positive definite matrix A = LL(T). The code can also compute the factorization A = LDL(T). The complexity of the software arises from several factors relating to the sparsity of the matrix A. A sparse N x N matrix A has typically less that cN nonzeroes where c is a small constant. If the matrix were dense, it would have O(N2) nonzeroes. The most complicated part of such sparse Cholesky factorization relates to fill-in, i.e., zeroes in the original matrix that become nonzeroes in the factor L. An efficient implementation depends to a large extent on complex data structures and on techniques from graph theory to reduce, identify, and manage fill. DSCPACK is based on an efficient multifrontal implementation with fill-managing algorithms and implementation arising from earlier research by Raghavan and others. Sparse Cholesky factorization is typically a four step process: (1) ordering to compute a fill-reducing numbering, (2) symbolic factorization to determine the nonzero structure of L, (3) numeric factorization to compute L, and, (4) triangular solution to solve L(T)x = y and Ly = b. The first two steps are symbolic and are performed using the graph of the matrix. The numeric factorization step is of dominant cost and there are several schemes for improving performance by exploiting the nested and dense structure of groups of columns in the factor. The latter are aimed at better utilization of the cache-memory hierarchy on modem processors to prevent cache-misses and provide execution rates (operations/second) that are close to the peak rates for dense matrix computations. Currently, EPISCOPACY is being used in an application at NASA directed by J. Newman and M. James. We propose the implementation of efficient schemes for updating the LL(T) or LDL(T) factors computed in DSCPACK-S to meet the computational requirements of their project. A brief description is provided in the next section.
NASA Astrophysics Data System (ADS)
Zhou, Zongzheng; Tordesillas, Antoinette
2017-06-01
The underlying microstructure and dynamics of a dense granular material as it evolves towards the "critical state", a limit state in which the system deforms with an essentially constant volume and stress ratio, remains widely debated in the micromechanics of granular media community. Strain localization, a common mechanism in the large strain regime, further complicates the characterization of this limit state. Here we revisit the evolution to this limit state within the framework of modern percolation theory. Attention is paid to motion transfer: in this context, percolation translates to the emergence of a large-scale connectivity in graphs that embody information on individual grain displacements. We construct each graph G(r) by connecting nodes, representing the grains, within a distance r in the displacement-state-space. As r increases, we observe a percolation transition on G(r). The size of the jump discontinuity increases in the lead up to failure, indicating that the nature of percolation transition changes from continuous to explosive. We attribute this to the emergence of collective motion, which manifests in increasingly isolated communities in G(r). At the limit state, where the jump discontinuity is highest and invariant across the different unjamming cycles (drops in stress ratio), G(r) encapsulates multiple kinematically distinct communities that are mediated by nodes corresponding to those grains in the shear band. This finding casts light on the dual and opposing roles of the shear band: a mechanism that creates powder keg divisions in the sample, while simultaneously acting as a mechanical link that transfers motion through such subdivisions moving in relative rigid-body motion.
Tuikkala, Johannes; Vähämaa, Heidi; Salmela, Pekka; Nevalainen, Olli S; Aittokallio, Tero
2012-03-26
Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure. We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape. The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure. By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications.
Dehkordy, Soudabeh Fazeli; Carlos, Ruth C.; Hall, Kelli S.; Dalton, Vanessa K.
2015-01-01
Rationale and Objectives Millions of people use online search engines every day to find health-related information and voluntarily share their personal health status and behaviors in various Web sites. Thus, data from tracking of online information seeker’s behavior offer potential opportunities for use in public health surveillance and research. Google Trends is a feature of Google which allows internet users to graph the frequency of searches for a single term or phrase over time or by geographic region. We used Google Trends to describe patterns of information seeking behavior in the subject of dense breasts and to examine their correlation with the passage or introduction of dense breast notification legislation. Materials and Methods In order to capture the temporal variations of information seeking about dense breasts, the web search query “dense breast” was entered in the Google Trends tool. We then mapped the dates of legislative actions regarding dense breasts that received widespread coverage in the lay media to information seeking trends about dense breasts over time. Results Newsworthy events and legislative actions appear to correlate well with peaks in search volume of “dense breast”. Geographic regions with the highest search volumes have either passed, denied, or are currently considering the dense breast legislation. Conclusions Our study demonstrated that any legislative action and respective news coverage correlate with increase in information seeking for “dense breast” on Google, suggesting that Google Trends has the potential to serve as a data source for policy-relevant research. PMID:24998689
SD-SEM: sparse-dense correspondence for 3D reconstruction of microscopic samples.
Baghaie, Ahmadreza; Tafti, Ahmad P; Owen, Heather A; D'Souza, Roshan M; Yu, Zeyun
2017-06-01
Scanning electron microscopy (SEM) imaging has been a principal component of many studies in biomedical, mechanical, and materials sciences since its emergence. Despite the high resolution of captured images, they remain two-dimensional (2D). In this work, a novel framework using sparse-dense correspondence is introduced and investigated for 3D reconstruction of stereo SEM images. SEM micrographs from microscopic samples are captured by tilting the specimen stage by a known angle. The pair of SEM micrographs is then rectified using sparse scale invariant feature transform (SIFT) features/descriptors and a contrario RANSAC for matching outlier removal to ensure a gross horizontal displacement between corresponding points. This is followed by dense correspondence estimation using dense SIFT descriptors and employing a factor graph representation of the energy minimization functional and loopy belief propagation (LBP) as means of optimization. Given the pixel-by-pixel correspondence and the tilt angle of the specimen stage during the acquisition of micrographs, depth can be recovered. Extensive tests reveal the strength of the proposed method for high-quality reconstruction of microscopic samples. Copyright © 2017 Elsevier Ltd. All rights reserved.
Top-k similar graph matching using TraM in biological networks.
Amin, Mohammad Shafkat; Finley, Russell L; Jamil, Hasan M
2012-01-01
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matching tools and algorithms have become of prime importance. Although the prohibitively expensive time complexity associated with exact subgraph isomorphism techniques has limited its efficacy in the application domain, approximate yet efficient graph matching techniques have received much attention due to their pragmatic applicability. Since public domain databases are noisy and incomplete in nature, inexact graph matching techniques have proven to be more promising in terms of inferring knowledge from numerous structural data repositories. In this paper, we propose a novel technique called TraM for approximate graph matching that off-loads a significant amount of its processing on to the database making the approach viable for large graphs. Moreover, the vector space embedding of the graphs and efficient filtration of the search space enables computation of approximate graph similarity at a throw-away cost. We annotate nodes of the query graphs by means of their global topological properties and compare them with neighborhood biased segments of the datagraph for proper matches. We have conducted experiments on several real data sets, and have demonstrated the effectiveness and efficiency of the proposed method
Locating influential nodes in complex networks
Malliaros, Fragkiskos D.; Rossi, Maria-Evgenia G.; Vazirgiannis, Michalis
2016-01-01
Understanding and controlling spreading processes in networks is an important topic with many diverse applications, including information dissemination, disease propagation and viral marketing. It is of crucial importance to identify which entities act as influential spreaders that can propagate information to a large portion of the network, in order to ensure efficient information diffusion, optimize available resources or even control the spreading. In this work, we capitalize on the properties of the K-truss decomposition, a triangle-based extension of the core decomposition of graphs, to locate individual influential nodes. Our analysis on real networks indicates that the nodes belonging to the maximal K-truss subgraph show better spreading behavior compared to previously used importance criteria, including node degree and k-core index, leading to faster and wider epidemic spreading. We further show that nodes belonging to such dense subgraphs, dominate the small set of nodes that achieve the optimal spreading in the network. PMID:26776455
Application of kernel functions for accurate similarity search in large chemical databases.
Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H
2010-04-29
Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
Yu, Qingbao; Du, Yuhui; Chen, Jiayu; He, Hao; Sui, Jing; Pearlson, Godfrey; Calhoun, Vince D
2017-11-01
A key challenge in building a brain graph using fMRI data is how to define the nodes. Spatial brain components estimated by independent components analysis (ICA) and regions of interest (ROIs) determined by brain atlas are two popular methods to define nodes in brain graphs. It is difficult to evaluate which method is better in real fMRI data. Here we perform a simulation study and evaluate the accuracies of a few graph metrics in graphs with nodes of ICA components, ROIs, or modified ROIs in four simulation scenarios. Graph measures with ICA nodes are more accurate than graphs with ROI nodes in all cases. Graph measures with modified ROI nodes are modulated by artifacts. The correlations of graph metrics across subjects between graphs with ICA nodes and ground truth are higher than the correlations between graphs with ROI nodes and ground truth in scenarios with large overlapped spatial sources. Moreover, moving the location of ROIs would largely decrease the correlations in all scenarios. Evaluating graphs with different nodes is promising in simulated data rather than real data because different scenarios can be simulated and measures of different graphs can be compared with a known ground truth. Since ROIs defined using brain atlas may not correspond well to real functional boundaries, overall findings of this work suggest that it is more appropriate to define nodes using data-driven ICA than ROI approaches in real fMRI data. Copyright © 2017 Elsevier B.V. All rights reserved.
Towards Scalable Graph Computation on Mobile Devices.
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2014-10-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.
Towards Scalable Graph Computation on Mobile Devices
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2015-01-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564
Distributed Sensing and Processing: A Graphical Model Approach
2005-11-30
that Ramanujan graph toplogies maximize the convergence rate of distributed detection consensus algorithms, improving over three orders of...small world type network designs. 14. SUBJECT TERMS Ramanujan graphs, sensor network topology, sensor network...that Ramanujan graphs, for which there are explicit algebraic constructions, have large eigenratios, converging much faster than structured graphs
Efficient Extraction of High Centrality Vertices in Distributed Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumbhare, Alok; Frincu, Marc; Raghavendra, Cauligi S.
2014-09-09
Betweenness centrality (BC) is an important measure for identifying high value or critical vertices in graphs, in variety of domains such as communication networks, road networks, and social graphs. However, calculating betweenness values is prohibitively expensive and, more often, domain experts are interested only in the vertices with the highest centrality values. In this paper, we first propose a partition-centric algorithm (MS-BC) to calculate BC for a large distributed graph that optimizes resource utilization and improves overall performance. Further, we extend the notion of approximate BC by pruning the graph and removing a subset of edges and vertices that contributemore » the least to the betweenness values of other vertices (MSL-BC), which further improves the runtime performance. We evaluate the proposed algorithms using a mix of real-world and synthetic graphs on an HPC cluster and analyze its strengths and weaknesses. The experimental results show an improvement in performance of upto 12x for large sparse graphs as compared to the state-of-the-art, and at the same time highlights the need for better partitioning methods to enable a balanced workload across partitions for unbalanced graphs such as small-world or power-law graphs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
Bipartite graphs as models of population structures in evolutionary multiplayer games.
Peña, Jorge; Rochat, Yannick
2012-01-01
By combining evolutionary game theory and graph theory, "games on graphs" study the evolutionary dynamics of frequency-dependent selection in population structures modeled as geographical or social networks. Networks are usually represented by means of unipartite graphs, and social interactions by two-person games such as the famous prisoner's dilemma. Unipartite graphs have also been used for modeling interactions going beyond pairwise interactions. In this paper, we argue that bipartite graphs are a better alternative to unipartite graphs for describing population structures in evolutionary multiplayer games. To illustrate this point, we make use of bipartite graphs to investigate, by means of computer simulations, the evolution of cooperation under the conventional and the distributed N-person prisoner's dilemma. We show that several implicit assumptions arising from the standard approach based on unipartite graphs (such as the definition of replacement neighborhoods, the intertwining of individual and group diversity, and the large overlap of interaction neighborhoods) can have a large impact on the resulting evolutionary dynamics. Our work provides a clear example of the importance of construction procedures in games on graphs, of the suitability of bigraphs and hypergraphs for computational modeling, and of the importance of concepts from social network analysis such as centrality, centralization and bipartite clustering for the understanding of dynamical processes occurring on networked population structures.
Mathematics of Web science: structure, dynamics and incentives.
Chayes, Jennifer
2013-03-28
Dr Chayes' talk described how, to a discrete mathematician, 'all the world's a graph, and all the people and domains merely vertices'. A graph is represented as a set of vertices V and a set of edges E, so that, for instance, in the World Wide Web, V is the set of pages and E the directed hyperlinks; in a social network, V is the people and E the set of relationships; and in the autonomous system Internet, V is the set of autonomous systems (such as AOL, Yahoo! and MSN) and E the set of connections. This means that mathematics can be used to study the Web (and other large graphs in the online world) in the following way: first, we can model online networks as large finite graphs; second, we can sample pieces of these graphs; third, we can understand and then control processes on these graphs; and fourth, we can develop algorithms for these graphs and apply them to improve the online experience.
Model-based multiple patterning layout decomposition
NASA Astrophysics Data System (ADS)
Guo, Daifeng; Tian, Haitong; Du, Yuelin; Wong, Martin D. F.
2015-10-01
As one of the most promising next generation lithography technologies, multiple patterning lithography (MPL) plays an important role in the attempts to keep in pace with 10 nm technology node and beyond. With feature size keeps shrinking, it has become impossible to print dense layouts within one single exposure. As a result, MPL such as double patterning lithography (DPL) and triple patterning lithography (TPL) has been widely adopted. There is a large volume of literature on DPL/TPL layout decomposition, and the current approach is to formulate the problem as a classical graph-coloring problem: Layout features (polygons) are represented by vertices in a graph G and there is an edge between two vertices if and only if the distance between the two corresponding features are less than a minimum distance threshold value dmin. The problem is to color the vertices of G using k colors (k = 2 for DPL, k = 3 for TPL) such that no two vertices connected by an edge are given the same color. This is a rule-based approach, which impose a geometric distance as a minimum constraint to simply decompose polygons within the distance into different masks. It is not desired in practice because this criteria cannot completely capture the behavior of the optics. For example, it lacks of sufficient information such as the optical source characteristics and the effects between the polygons outside the minimum distance. To remedy the deficiency, a model-based layout decomposition approach to make the decomposition criteria base on simulation results was first introduced at SPIE 2013.1 However, the algorithm1 is based on simplified assumption on the optical simulation model and therefore its usage on real layouts is limited. Recently AMSL2 also proposed a model-based approach to layout decomposition by iteratively simulating the layout, which requires excessive computational resource and may lead to sub-optimal solutions. The approach2 also potentially generates too many stiches. In this paper, we propose a model-based MPL layout decomposition method using a pre-simulated library of frequent layout patterns. Instead of using the graph G in the standard graph-coloring formulation, we build an expanded graph H where each vertex represents a group of adjacent features together with a coloring solution. By utilizing the library and running sophisticated graph algorithms on H, our approach can obtain optimal decomposition results efficiently. Our model-based solution can achieve a practical mask design which significantly improves the lithography quality on the wafer compared to the rule based decomposition.
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
Cong, Yingnan; Chan, Yao-ban; Phillips, Charles A.; Langston, Michael A.; Ragan, Mark A.
2017-01-01
Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. PMID:28154557
Approximate Computing Techniques for Iterative Graph Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh
Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
Distributed Computation of the knn Graph for Large High-Dimensional Point Sets
Plaku, Erion; Kavraki, Lydia E.
2009-01-01
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318
Learning Financial Reports From Mixed Symbolic-Spatial Graphs
ERIC Educational Resources Information Center
Tanlamai, Uthai; Soongswang, Oranuj
2011-01-01
Mixed visuals of numbers and graphs are available in various financial reports that demonstrate the financial status and risks of a firm. GWN (graphs with numbers) and TWG (table of numbers with graphs) were used as two alternative visuals derived from the actual data of two large public companies, one from food manufacturing industry (food) and…
Planning Assembly Of Large Truss Structures In Outer Space
NASA Technical Reports Server (NTRS)
De Mello, Luiz S. Homem; Desai, Rajiv S.
1992-01-01
Report dicusses developmental algorithm used in systematic planning of sequences of operations in which large truss structures assembled in outer space. Assembly sequence represented by directed graph called "assembly graph", in which each arc represents joining of two parts or subassemblies. Algorithm generates assembly graph, working backward from state of complete assembly to initial state, in which all parts disassembled. Working backward more efficient than working forward because it avoids intermediate dead ends.
NASA Astrophysics Data System (ADS)
Nex, F.; Gerke, M.
2014-08-01
Image matching techniques can nowadays provide very dense point clouds and they are often considered a valid alternative to LiDAR point cloud. However, photogrammetric point clouds are often characterized by a higher level of random noise compared to LiDAR data and by the presence of large outliers. These problems constitute a limitation in the practical use of photogrammetric data for many applications but an effective way to enhance the generated point cloud has still to be found. In this paper we concentrate on the restoration of Digital Surface Models (DSM), computed from dense image matching point clouds. A photogrammetric DSM, i.e. a 2.5D representation of the surface is still one of the major products derived from point clouds. Four different algorithms devoted to DSM denoising are presented: a standard median filter approach, a bilateral filter, a variational approach (TGV: Total Generalized Variation), as well as a newly developed algorithm, which is embedded into a Markov Random Field (MRF) framework and optimized through graph-cuts. The ability of each algorithm to recover the original DSM has been quantitatively evaluated. To do that, a synthetic DSM has been generated and different typologies of noise have been added to mimic the typical errors of photogrammetric DSMs. The evaluation reveals that standard filters like median and edge preserving smoothing through a bilateral filter approach cannot sufficiently remove typical errors occurring in a photogrammetric DSM. The TGV-based approach much better removes random noise, but large areas with outliers still remain. Our own method which explicitly models the degradation properties of those DSM outperforms the others in all aspects.
Efficient large-scale graph data optimization for intelligent video surveillance
NASA Astrophysics Data System (ADS)
Shang, Quanhong; Zhang, Shujun; Wang, Yanbo; Sun, Chen; Wang, Zepeng; Zhang, Luming
2017-08-01
Society is rapidly accepting the use of a wide variety of cameras Location and applications: site traffic monitoring, parking Lot surveillance, car and smart space. These ones here the camera provides data every day in an analysis Effective way. Recent advances in sensor technology Manufacturing, communications and computing are stimulating.The development of new applications that can change the traditional Vision system incorporating universal smart camera network. This Analysis of visual cues in multi camera networks makes wide Applications ranging from smart home and office automation to large area surveillance and traffic surveillance. In addition, dense Camera networks, most of which have large overlapping areas of cameras. In the view of good research, we focus on sparse camera networks. One Sparse camera network using large area surveillance. As few cameras as possible, most cameras do not overlap Each other’s field of vision. This task is challenging Lack of knowledge of topology Network, the specific changes in appearance and movement Track different opinions of the target, as well as difficulties Understanding complex events in a network. In this review in this paper, we present a comprehensive survey of recent studies Results to solve the problem of topology learning, Object appearance modeling and global activity understanding sparse camera network. In addition, some of the current open Research issues are discussed.
Graph Visualization for RDF Graphs with SPARQL-EndPoints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R; Bond, Nathaniel
2014-07-11
RDF graphs are hard to visualize as triples. This software module is a web interface that connects to a SPARQL endpoint and retrieves graph data that the user can explore interactively and seamlessly. The software written in python and JavaScript has been tested to work on screens as little as the smart phones to large screens such as EVEREST.
Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina
2016-07-01
Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.
2012-01-01
Background Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure. Methods We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape. Results The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure. Conclusions By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications. PMID:22448851
Exploring and Making Sense of Large Graphs
2015-08-01
and bold) are n × n ; vectors (lower-case bold) are n × 1 column vectors, and scalars (in lower-case plain font) typically correspond to strength of...graph is often denoted as |V| or n . Edges or Links: A finite set E of lines between objects in a graph. The edges represent relationships between the...Adjacency matrix of a simple, unweighted and undirected graph. Adjacency matrix: The adjacency matrix of a graph G is an n × n matrix A, whose element aij
Measuring Two-Event Structural Correlations on Graphs
2012-08-01
2012 to 00-00-2012 4. TITLE AND SUBTITLE Measuring Two-Event Structural Correlations on Graphs 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ...by event simulation on the DBLP graph. Then we examine the efficiency and scala - bility of the framework with a Twitter network. The third part of...correlation pattern mining for large graphs. In Proc. of the 8th Workshop on Mining and Learning with Graphs, pages 119–126, 2010. [23] T. Smith. A
GrouseFlocks: steerable exploration of graph hierarchy space.
Archambault, Daniel; Munzner, Tamara; Auber, David
2008-01-01
Several previous systems allow users to interactively explore a large input graph through cuts of a superimposed hierarchy. This hierarchy is often created using clustering algorithms or topological features present in the graph. However, many graphs have domain-specific attributes associated with the nodes and edges, which could be used to create many possible hierarchies providing unique views of the input graph. GrouseFlocks is a system for the exploration of this graph hierarchy space. By allowing users to see several different possible hierarchies on the same graph, the system helps users investigate graph hierarchy space instead of a single fixed hierarchy. GrouseFlocks provides a simple set of operations so that users can create and modify their graph hierarchies based on selections. These selections can be made manually or based on patterns in the attribute data provided with the graph. It provides feedback to the user within seconds, allowing interactive exploration of this space.
Structural graph-based morphometry: A multiscale searchlight framework based on sulcal pits.
Takerkart, Sylvain; Auzias, Guillaume; Brun, Lucile; Coulon, Olivier
2017-01-01
Studying the topography of the cortex has proved valuable in order to characterize populations of subjects. In particular, the recent interest towards the deepest parts of the cortical sulci - the so-called sulcal pits - has opened new avenues in that regard. In this paper, we introduce the first fully automatic brain morphometry method based on the study of the spatial organization of sulcal pits - Structural Graph-Based Morphometry (SGBM). Our framework uses attributed graphs to model local patterns of sulcal pits, and further relies on three original contributions. First, a graph kernel is defined to provide a new similarity measure between pit-graphs, with few parameters that can be efficiently estimated from the data. Secondly, we present the first searchlight scheme dedicated to brain morphometry, yielding dense information maps covering the full cortical surface. Finally, a multi-scale inference strategy is designed to jointly analyze the searchlight information maps obtained at different spatial scales. We demonstrate the effectiveness of our framework by studying gender differences and cortical asymmetries: we show that SGBM can both localize informative regions and estimate their spatial scales, while providing results which are consistent with the literature. Thanks to the modular design of our kernel and the vast array of available kernel methods, SGBM can easily be extended to include a more detailed description of the sulcal patterns and solve different statistical problems. Therefore, we suggest that our SGBM framework should be useful for both reaching a better understanding of the normal brain and defining imaging biomarkers in clinical settings. Copyright © 2016 Elsevier B.V. All rights reserved.
Bipartite Graphs as Models of Population Structures in Evolutionary Multiplayer Games
Peña, Jorge; Rochat, Yannick
2012-01-01
By combining evolutionary game theory and graph theory, “games on graphs” study the evolutionary dynamics of frequency-dependent selection in population structures modeled as geographical or social networks. Networks are usually represented by means of unipartite graphs, and social interactions by two-person games such as the famous prisoner’s dilemma. Unipartite graphs have also been used for modeling interactions going beyond pairwise interactions. In this paper, we argue that bipartite graphs are a better alternative to unipartite graphs for describing population structures in evolutionary multiplayer games. To illustrate this point, we make use of bipartite graphs to investigate, by means of computer simulations, the evolution of cooperation under the conventional and the distributed N-person prisoner’s dilemma. We show that several implicit assumptions arising from the standard approach based on unipartite graphs (such as the definition of replacement neighborhoods, the intertwining of individual and group diversity, and the large overlap of interaction neighborhoods) can have a large impact on the resulting evolutionary dynamics. Our work provides a clear example of the importance of construction procedures in games on graphs, of the suitability of bigraphs and hypergraphs for computational modeling, and of the importance of concepts from social network analysis such as centrality, centralization and bipartite clustering for the understanding of dynamical processes occurring on networked population structures. PMID:22970237
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs
NASA Astrophysics Data System (ADS)
Saha, Barna; Hoch, Allison; Khuller, Samir; Raschid, Louiqa; Zhang, Xiao-Ning
In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.
Return probabilities and hitting times of random walks on sparse Erdös-Rényi graphs.
Martin, O C; Sulc, P
2010-03-01
We consider random walks on random graphs, focusing on return probabilities and hitting times for sparse Erdös-Rényi graphs. Using the tree approach, which is expected to be exact in the large graph limit, we show how to solve for the distribution of these quantities and we find that these distributions exhibit a form of self-similarity.
Statistically significant relational data mining :
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
John Homer; Ashok Varikuti; Xinming Ou
Various tools exist to analyze enterprise network systems and to produce attack graphs detailing how attackers might penetrate into the system. These attack graphs, however, are often complex and difficult to comprehend fully, and a human user may find it problematic to reach appropriate configuration decisions. This paper presents methodologies that can 1) automatically identify portions of an attack graph that do not help a user to understand the core security problems and so can be trimmed, and 2) automatically group similar attack steps as virtual nodes in a model of the network topology, to immediately increase the understandability ofmore » the data. We believe both methods are important steps toward improving visualization of attack graphs to make them more useful in configuration management for large enterprise networks. We implemented our methods using one of the existing attack-graph toolkits. Initial experimentation shows that the proposed approaches can 1) significantly reduce the complexity of attack graphs by trimming a large portion of the graph that is not needed for a user to understand the security problem, and 2) significantly increase the accessibility and understandability of the data presented in the attack graph by clearly showing, within a generated visualization of the network topology, the number and type of potential attacks to which each host is exposed.« less
Time series analysis of the developed financial markets' integration using visibility graphs
NASA Astrophysics Data System (ADS)
Zhuang, Enyu; Small, Michael; Feng, Gang
2014-09-01
A time series representing the developed financial markets' segmentation from 1973 to 2012 is studied. The time series reveals an obvious market integration trend. To further uncover the features of this time series, we divide it into seven windows and generate seven visibility graphs. The measuring capabilities of the visibility graphs provide means to quantitatively analyze the original time series. It is found that the important historical incidents that influenced market integration coincide with variations in the measured graphical node degree. Through the measure of neighborhood span, the frequencies of the historical incidents are disclosed. Moreover, it is also found that large "cycles" and significant noise in the time series are linked to large and small communities in the generated visibility graphs. For large cycles, how historical incidents significantly affected market integration is distinguished by density and compactness of the corresponding communities.
Large-scale quantum networks based on graphs
NASA Astrophysics Data System (ADS)
Epping, Michael; Kampermann, Hermann; Bruß, Dagmar
2016-05-01
Society relies and depends increasingly on information exchange and communication. In the quantum world, security and privacy is a built-in feature for information processing. The essential ingredient for exploiting these quantum advantages is the resource of entanglement, which can be shared between two or more parties. The distribution of entanglement over large distances constitutes a key challenge for current research and development. Due to losses of the transmitted quantum particles, which typically scale exponentially with the distance, intermediate quantum repeater stations are needed. Here we show how to generalise the quantum repeater concept to the multipartite case, by describing large-scale quantum networks, i.e. network nodes and their long-distance links, consistently in the language of graphs and graph states. This unifying approach comprises both the distribution of multipartite entanglement across the network, and the protection against errors via encoding. The correspondence to graph states also provides a tool for optimising the architecture of quantum networks.
The Use of Weighted Graphs for Large-Scale Genome Analysis
Zhou, Fang; Toivonen, Hannu; King, Ross D.
2014-01-01
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061
Islands and Bridges: Making Sense of Marked Nodes in Large Graphs
2013-01-01
our methods to heterogeneous and time-evolving graphs. References [1] Nouf M. Kh. Alsudairy, Vijay V. Raghavan, Alaaeldin M. Hafez, and Hassan I...multi-relational graphs. SIGKDD Explor., 7(2):56–63, 2005. [24] Jason Riedy, David A. Bader, Karl Jiang, Pushkar Pande, , and Richa Sharma . Detecting
Weighted graph cuts without eigenvectors a multilevel approach.
Dhillon, Inderjit S; Guan, Yuqiang; Kulis, Brian
2007-11-01
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods--in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast, high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods, such as Metis, have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis and gene network analysis.
A simple method for finding the scattering coefficients of quantum graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cottrell, Seth S.
2015-09-15
Quantum walks are roughly analogous to classical random walks, and similar to classical walks they have been used to find new (quantum) algorithms. When studying the behavior of large graphs or combinations of graphs, it is useful to find the response of a subgraph to signals of different frequencies. In doing so, we can replace an entire subgraph with a single vertex with variable scattering coefficients. In this paper, a simple technique for quickly finding the scattering coefficients of any discrete-time quantum graph will be presented. These scattering coefficients can be expressed entirely in terms of the characteristic polynomial ofmore » the graph’s time step operator. This is a marked improvement over previous techniques which have traditionally required finding eigenstates for a given eigenvalue, which is far more computationally costly. With the scattering coefficients we can easily derive the “impulse response” which is the key to predicting the response of a graph to any signal. This gives us a powerful set of tools for rapidly understanding the behavior of graphs or for reducing a large graph into its constituent subgraphs regardless of how they are connected.« less
An asynchronous traversal engine for graph-based rich metadata management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Dong; Carns, Philip; Ross, Robert B.
Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
An asynchronous traversal engine for graph-based rich metadata management
Dai, Dong; Carns, Philip; Ross, Robert B.; ...
2016-06-23
Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
Graph processing platforms at scale: practices and experiences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lim, Seung-Hwan; Lee, Sangkeun; Brown, Tyler C
2015-01-01
Graph analysis unveils hidden associations of data in many phenomena and artifacts, such as road network, social networks, genomic information, and scientific collaboration. Unfortunately, a wide diversity in the characteristics of graphs and graph operations make it challenging to find a right combination of tools and implementation of algorithms to discover desired knowledge from the target data set. This study presents an extensive empirical study of three representative graph processing platforms: Pegasus, GraphX, and Urika. Each system represents a combination of options in data model, processing paradigm, and infrastructure. We benchmarked each platform using three popular graph operations, degree distribution,more » connected components, and PageRank over a variety of real-world graphs. Our experiments show that each graph processing platform shows different strength, depending the type of graph operations. While Urika performs the best in non-iterative operations like degree distribution, GraphX outputforms iterative operations like connected components and PageRank. In addition, we discuss challenges to optimize the performance of each platform over large scale real world graphs.« less
Large-scale Graph Computation on Just a PC
2014-05-01
edges for several vertices simultaneously). We compared the performance of GraphChi-DB to Neo4j using their Java API (we discuss MySQL comparison in the...75 4.7.6 Comparison to RDBMS ( MySQL ) . . . . . . . . . . . . . . . . . . . . . 75 4.7.7 Summary of the...Windows method, GraphChi. The C++ implementation has circa 8,000 lines of code. We have also de- veloped a Java -version of GraphChi, but it does not
NASA Astrophysics Data System (ADS)
Russell, Scott; Walker, David M.; Tordesillas, Antoinette
2016-03-01
A framework for the multiscale characterization of the coupled evolution of the solid grain fabric and its associated pore space in dense granular media is developed. In this framework, a pseudo-dual graph transformation of the grain contact network produces a graph of pores which can be readily interpreted as a pore space network. Survivability, a new metric succinctly summarizing the connectivity of the solid grain and pore space networks, measures material robustness. The size distribution and the connectivity of pores can be characterized quantitatively through various network properties. Assortativity characterizes the pore space with respect to the parity of the number of particles enclosing the pore. Multiscale clusters of odd parity versus even parity contact cycles alternate spatially along the shear band: these represent, respectively, local jamming and unjamming regions that continually switch positions in time throughout the failure regime. Optimal paths, established using network shortest paths in favor of large pores, provide clues on preferential paths for interstitial matter transport. In systems with higher rolling resistance at contacts, less tortuous shortest paths thread through larger pores in shear bands. Notably the structural patterns uncovered in the pore space suggest that more robust models of interstitial pore flow through deforming granular systems require a proper consideration of the evolution of in situ shear band and fracture patterns - not just globally, but also inside these localized failure zones.
Interference graph-based dynamic frequency reuse in optical attocell networks
NASA Astrophysics Data System (ADS)
Liu, Huanlin; Xia, Peijie; Chen, Yong; Wu, Lan
2017-11-01
Indoor optical attocell network may achieve higher capacity than radio frequency (RF) or Infrared (IR)-based wireless systems. It is proposed as a special type of visible light communication (VLC) system using Light Emitting Diodes (LEDs). However, the system spectral efficiency may be severely degraded owing to the inter-cell interference (ICI), particularly for dense deployment scenarios. To address these issues, we construct the spectral interference graph for indoor optical attocell network, and propose the Dynamic Frequency Reuse (DFR) and Weighted Dynamic Frequency Reuse (W-DFR) algorithms to decrease ICI and improve the spectral efficiency performance. The interference graph makes LEDs can transmit data without interference and select the minimum sub-bands needed for frequency reuse. Then, DFR algorithm reuses the system frequency equally across service-providing cells to mitigate spectrum interference. While W-DFR algorithm can reuse the system frequency by using the bandwidth weight (BW), which is defined based on the number of service users. Numerical results show that both of the proposed schemes can effectively improve the average spectral efficiency (ASE) of the system. Additionally, improvement of the user data rate is also obtained by analyzing its cumulative distribution function (CDF).
Graph Theory-Based Analysis of the Lymph Node Fibroblastic Reticular Cell Network.
Novkovic, Mario; Onder, Lucas; Bocharov, Gennady; Ludewig, Burkhard
2017-01-01
Secondary lymphoid organs have developed segregated niches that are able to initiate and maintain effective immune responses. Such global organization requires tight control of diverse cellular components, specifically those that regulate lymphocyte trafficking. Fibroblastic reticular cells (FRCs) form a densely interconnected network in lymph nodes and provide key factors necessary for T cell migration and retention, and foster subsequent interactions between T cells and dendritic cells. Development of integrative systems biology approaches has made it possible to elucidate this multilevel complexity of the immune system. Here, we present a graph theory-based analysis of the FRC network in murine lymph nodes, where generation of the network topology is performed using high-resolution confocal microscopy and 3D reconstruction. This approach facilitates the analysis of physical cell-to-cell connectivity, and estimation of topological robustness and global behavior of the network when it is subjected to perturbation in silico.
Helping Young Children Understand Graphs: A Demonstration Study.
ERIC Educational Resources Information Center
Freeland, Kent; Madden, Wendy
1990-01-01
Outlines a demonstration lesson showing third graders how to make and interpret graphs. Includes descriptions of purpose, vocabulary, and learning activities in which students graph numbers of students with dogs at home and analyze the contents of M&M candy packages by color. Argues process helps students understand large amounts of abstract…
Young Children Communicate with Graphs
ERIC Educational Resources Information Center
Cathcart, W. George
1978-01-01
Graphing is an integrative skill because you can use it whether you are teaching measurement or geometry or number theory or most any other topic. It is also important as a mode of communication which can simplify a large amount of information. Here are five steps for effective presentation of graphing to young students. (Author/RK)
Copying Helps Novice Learners Build Orthographic Knowledge: Methods for Teaching Devanagari Akshara
ERIC Educational Resources Information Center
Bhide, Adeetee
2018-01-01
Hindi graphs, called akshara, are difficult to learn because of their visual complexity and large set of graphs. Akshara containing multiple consonants (complex akshara) are particularly difficult. In Hindi, complex akshara are formed by fusing individual consonantal graphs. Some complex akshara look similar to their component parts (transparent),…
Jing, X; Cimino, J J
2014-01-01
Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases. Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected class count thresholds, are demonstrated with details in case studies. The main steps include: data preparation, data manipulation, computation, and threshold selection and visualization. We also describe the data models for different types of thresholds and the considerations for thresholds selection. The filtered graphs are 1%-3% of the size of the original graphs. For our case studies, the graphs provide 1) the most heavily used ICD9-CM codes, 2) the codes with most patients in a research hospital in 2011, 3) a profile of publications on "heavily represented topics" in MEDLINE in 2011, and 4) validated knowledge about adverse effects of the medication of rosiglitazone and new interesting areas in the ICD9-CM hierarchy associated with patients taking the medication of pioglitazone. Our filtering method reduces large graphs to a manageable size by removing relatively unimportant nodes. The graphical method provides summary views based on computation of usage frequency and semantic context of hierarchical terminology. The method is applicable to large data sets (such as a hundred thousand records or more) and can be used to generate new hypotheses from data sets coded with hierarchical terminologies.
Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Xuanhua; Luo, Xuan; Liang, Junling
GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less
Vertex centrality as a measure of information flow in Italian Corporate Board Networks
NASA Astrophysics Data System (ADS)
Grassi, Rosanna
2010-06-01
The aim of this article is to investigate the governance models of companies listed on the Italian Stock Exchange by using a network approach, which describes the interlinks between boards of directors. Following mainstream literature, I construct a weighted graph representing the listed companies (vertices) and their relationships (weighted edges), the Corporate Board Network; I then apply three different vertex centrality measures: degree, betweenness and flow betweenness. What emerges from the network construction and by applying the degree centrality is a structure with a large number of connections but not particularly dense, where the presence of a small number of highly connected nodes (hubs) is evident. Then I focus on betweenness and flow betweenness; indeed I expect that these centrality measures may give a representation of the intensity of the relationship between companies, capturing the volume of information flowing from one vertex to another. Finally, I investigate the possible scale-free structure of the network.
Minimizing Communication in All-Pairs Shortest Paths
2013-02-13
on a 16,384 vertex, 5% dense graph, is slightly faster using our approach (18.6 vs . 22.6 seconds) than using the replicated Johnson’s algorithm...Oracle and Samsung , as well as MathWorks. Research is also supported by DOE grants DE-SC0004938, DE-SC0005136, DE-SC0003959, DE-SC0008700, and AC02...Brickell, I. S. Dhillon, S. Sra, and J. A. Tropp. The metric nearness problem. SIAM J. Matrix Anal. Appl ., 30:375–396, 2008. [11] A. Buluç, J. R. Gilbert
Tri-track: free software for large-scale particle tracking.
Vallotton, Pascal; Olivier, Sandra
2013-04-01
The ability to correctly track objects in time-lapse sequences is important in many applications of microscopy. Individual object motions typically display a level of dynamic regularity reflecting the existence of an underlying physics or biology. Best results are obtained when this local information is exploited. Additionally, if the particle number is known to be approximately constant, a large number of tracking scenarios may be rejected on the basis that they are not compatible with a known maximum particle velocity. This represents information of a global nature, which should ideally be exploited too. Some time ago, we devised an efficient algorithm that exploited both types of information. The tracking task was reduced to a max-flow min-cost problem instance through a novel graph structure that comprised vertices representing objects from three consecutive image frames. The algorithm is explained here for the first time. A user-friendly implementation is provided, and the specific relaxation mechanism responsible for the method's effectiveness is uncovered. The software is particularly competitive for complex dynamics such as dense antiparallel flows, or in situations where object displacements are considerable. As an application, we characterize a remarkable vortex structure formed by bacteria engaged in interstitial motility.
Weights and topology: a study of the effects of graph construction on 3D image segmentation.
Grady, Leo; Jolly, Marie-Pierre
2008-01-01
Graph-based algorithms have become increasingly popular for medical image segmentation. The fundamental process for each of these algorithms is to use the image content to generate a set of weights for the graph and then set conditions for an optimal partition of the graph with respect to these weights. To date, the heuristics used for generating the weighted graphs from image intensities have largely been ignored, while the primary focus of attention has been on the details of providing the partitioning conditions. In this paper we empirically study the effects of graph connectivity and weighting function on the quality of the segmentation results. To control for algorithm-specific effects, we employ both the Graph Cuts and Random Walker algorithms in our experiments.
Fast generation of sparse random kernel graphs
Hagberg, Aric; Lemons, Nathan; Du, Wen -Bo
2015-09-10
The development of kernel-based inhomogeneous random graphs has provided models that are flexible enough to capture many observed characteristics of real networks, and that are also mathematically tractable. We specify a class of inhomogeneous random graph models, called random kernel graphs, that produces sparse graphs with tunable graph properties, and we develop an efficient generation algorithm to sample random instances from this model. As real-world networks are usually large, it is essential that the run-time of generation algorithms scales better than quadratically in the number of vertices n. We show that for many practical kernels our algorithm runs in timemore » at most ο(n(logn)²). As an example, we show how to generate samples of power-law degree distribution graphs with tunable assortativity.« less
Building occupancy simulation and data assimilation using a graph-based agent-oriented model
NASA Astrophysics Data System (ADS)
Rai, Sanish; Hu, Xiaolin
2018-07-01
Building occupancy simulation and estimation simulates the dynamics of occupants and estimates their real-time spatial distribution in a building. It requires a simulation model and an algorithm for data assimilation that assimilates real-time sensor data into the simulation model. Existing building occupancy simulation models include agent-based models and graph-based models. The agent-based models suffer high computation cost for simulating large numbers of occupants, and graph-based models overlook the heterogeneity and detailed behaviors of individuals. Recognizing the limitations of existing models, this paper presents a new graph-based agent-oriented model which can efficiently simulate large numbers of occupants in various kinds of building structures. To support real-time occupancy dynamics estimation, a data assimilation framework based on Sequential Monte Carlo Methods is also developed and applied to the graph-based agent-oriented model to assimilate real-time sensor data. Experimental results show the effectiveness of the developed model and the data assimilation framework. The major contributions of this work are to provide an efficient model for building occupancy simulation that can accommodate large numbers of occupants and an effective data assimilation framework that can provide real-time estimations of building occupancy from sensor data.
Prototype Vector Machine for Large Scale Semi-Supervised Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Kai; Kwok, James T.; Parvin, Bahram
2009-04-29
Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of themore » kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.« less
Graph Analytics for Signature Discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Johnson, John R.; Halappanavar, Mahantesh
2013-06-01
Within large amounts of seemingly unstructured data it can be diffcult to find signatures of events. In our work we transform unstructured data into a graph representation. By doing this we expose underlying structure in the data and can take advantage of existing graph analytics capabilities, as well as develop new capabilities. Currently we focus on applications in cybersecurity and communication domains. Within cybersecurity we aim to find signatures for perpetrators using the pass-the-hash attack, and in communications we look for emails or phone calls going up or down a chain of command. In both of these areas, and inmore » many others, the signature we look for is a path with certain temporal properties. In this paper we discuss our methodology for finding these temporal paths within large graphs.« less
A global/local affinity graph for image segmentation.
Xiaofang Wang; Yuxing Tang; Masnou, Simon; Liming Chen
2015-04-01
Construction of a reliable graph capturing perceptual grouping cues of an image is fundamental for graph-cut based image segmentation methods. In this paper, we propose a novel sparse global/local affinity graph over superpixels of an input image to capture both short- and long-range grouping cues, and thereby enabling perceptual grouping laws, including proximity, similarity, continuity, and to enter in action through a suitable graph-cut algorithm. Moreover, we also evaluate three major visual features, namely, color, texture, and shape, for their effectiveness in perceptual segmentation and propose a simple graph fusion scheme to implement some recent findings from psychophysics, which suggest combining these visual features with different emphases for perceptual grouping. In particular, an input image is first oversegmented into superpixels at different scales. We postulate a gravitation law based on empirical observations and divide superpixels adaptively into small-, medium-, and large-sized sets. Global grouping is achieved using medium-sized superpixels through a sparse representation of superpixels' features by solving a ℓ0-minimization problem, and thereby enabling continuity or propagation of local smoothness over long-range connections. Small- and large-sized superpixels are then used to achieve local smoothness through an adjacent graph in a given feature space, and thus implementing perceptual laws, for example, similarity and proximity. Finally, a bipartite graph is also introduced to enable propagation of grouping cues between superpixels of different scales. Extensive experiments are carried out on the Berkeley segmentation database in comparison with several state-of-the-art graph constructions. The results show the effectiveness of the proposed approach, which outperforms state-of-the-art graphs using four different objective criteria, namely, the probabilistic rand index, the variation of information, the global consistency error, and the boundary displacement error.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Joslyn, Cliff A.; Chappell, Alan R.
As semantic datasets grow to be very large and divergent, there is a need to identify and exploit their inherent semantic structure for discovery and optimization. Towards that end, we present here a novel methodology to identify the semantic structures inherent in an arbitrary semantic graph dataset. We first present the concept of an extant ontology as a statistical description of the semantic relations present amongst the typed entities modeled in the graph. This serves as a model of the underlying semantic structure to aid in discovery and visualization. We then describe a method of ontological scaling in which themore » ontology is employed as a hierarchical scaling filter to infer different resolution levels at which the graph structures are to be viewed or analyzed. We illustrate these methods on three large and publicly available semantic datasets containing more than one billion edges each. Keywords-Semantic Web; Visualization; Ontology; Multi-resolution Data Mining;« less
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan
2010-10-04
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
Graph reconstruction using covariance-based methods.
Sulaimanov, Nurgazy; Koeppl, Heinz
2016-12-01
Methods based on correlation and partial correlation are today employed in the reconstruction of a statistical interaction graph from high-throughput omics data. These dedicated methods work well even for the case when the number of variables exceeds the number of samples. In this study, we investigate how the graphs extracted from covariance and concentration matrix estimates are related by using Neumann series and transitive closure and through discussing concrete small examples. Considering the ideal case where the true graph is available, we also compare correlation and partial correlation methods for large realistic graphs. In particular, we perform the comparisons with optimally selected parameters based on the true underlying graph and with data-driven approaches where the parameters are directly estimated from the data.
Pan, Yongke; Niu, Wenjia
2017-01-01
Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616
Decision net, directed graph, and neural net processing of imaging spectrometer data
NASA Technical Reports Server (NTRS)
Casasent, David; Liu, Shiaw-Dong; Yoneyama, Hideyuki; Barnard, Etienne
1989-01-01
A decision-net solution involving a novel hierarchical classifier and a set of multiple directed graphs, as well as a neural-net solution, are respectively presented for large-class problem and mixture problem treatments of imaging spectrometer data. The clustering method for hierarchical classifier design, when used with multiple directed graphs, yields an efficient decision net. New directed-graph rules for reducing local maxima as well as the number of perturbations required, and the new starting-node rules for extending the reachability and reducing the search time of the graphs, are noted to yield superior results, as indicated by an illustrative 500-class imaging spectrometer problem.
Distribution of diameters for Erdős-Rényi random graphs.
Hartmann, A K; Mézard, M
2018-03-01
We study the distribution of diameters d of Erdős-Rényi random graphs with average connectivity c. The diameter d is the maximum among all the shortest distances between pairs of nodes in a graph and an important quantity for all dynamic processes taking place on graphs. Here we study the distribution P(d) numerically for various values of c, in the nonpercolating and percolating regimes. Using large-deviation techniques, we are able to reach small probabilities like 10^{-100} which allow us to obtain the distribution over basically the full range of the support, for graphs up to N=1000 nodes. For values c<1, our results are in good agreement with analytical results, proving the reliability of our numerical approach. For c>1 the distribution is more complex and no complete analytical results are available. For this parameter range, P(d) exhibits an inflection point, which we found to be related to a structural change of the graphs. For all values of c, we determined the finite-size rate function Φ(d/N) and were able to extrapolate numerically to N→∞, indicating that the large-deviation principle holds.
Distribution of diameters for Erdős-Rényi random graphs
NASA Astrophysics Data System (ADS)
Hartmann, A. K.; Mézard, M.
2018-03-01
We study the distribution of diameters d of Erdős-Rényi random graphs with average connectivity c . The diameter d is the maximum among all the shortest distances between pairs of nodes in a graph and an important quantity for all dynamic processes taking place on graphs. Here we study the distribution P (d ) numerically for various values of c , in the nonpercolating and percolating regimes. Using large-deviation techniques, we are able to reach small probabilities like 10-100 which allow us to obtain the distribution over basically the full range of the support, for graphs up to N =1000 nodes. For values c <1 , our results are in good agreement with analytical results, proving the reliability of our numerical approach. For c >1 the distribution is more complex and no complete analytical results are available. For this parameter range, P (d ) exhibits an inflection point, which we found to be related to a structural change of the graphs. For all values of c , we determined the finite-size rate function Φ (d /N ) and were able to extrapolate numerically to N →∞ , indicating that the large-deviation principle holds.
Evolutionary graph theory: breaking the symmetry between interaction and replacement
Ohtsuki, Hisashi; Pacheco, Jorge M.; Nowak, Martin A.
2008-01-01
We study evolutionary dynamics in a population whose structure is given by two graphs: the interaction graph determines who plays with whom in an evolutionary game; the replacement graph specifies the geometry of evolutionary competition and updating. First, we calculate the fixation probabilities of frequency dependent selection between two strategies or phenotypes. We consider three different update mechanisms: birth-death, death-birth and imitation. Then, as a particular example, we explore the evolution of cooperation. Suppose the interaction graph is a regular graph of degree h, the replacement graph is a regular graph of degree g and the overlap between the two graphs is a regular graph of degree l. We show that cooperation is favored by natural selection if b/c > hg/l. Here, b and c denote the benefit and cost of the altruistic act. This result holds for death-birth updating, weak selection and large population size. Note that the optimum population structure for cooperators is given by maximum overlap between the interaction and the replacement graph (g = h = l), which means that the two graphs are identical. We also prove that a modified replicator equation can describe how the expected values of the frequencies of an arbitrary number of strategies change on replacement and interaction graphs: the two graphs induce a transformation of the payoff matrix. PMID:17350049
Evolving network simulation study. From regular lattice to scale free network
NASA Astrophysics Data System (ADS)
Makowiec, D.
2005-12-01
The Watts-Strogatz algorithm of transferring the square lattice to a small world network is modified by introducing preferential rewiring constrained by connectivity demand. The evolution of the network is two-step: sequential preferential rewiring of edges controlled by p and updating the information about changes done. The evolving system self-organizes into stationary states. The topological transition in the graph structure is noticed with respect to p. Leafy phase a graph formed by multiple connected vertices (graph skeleton) with plenty of leaves attached to each skeleton vertex emerges when p is small enough to pretend asynchronous evolution. Tangling phase where edges of a graph circulate frequently among low degree vertices occurs when p is large. There exist conditions at which the resulting stationary network ensemble provides networks which degree distribution exhibit power-law decay in large interval of degrees.
Graph Theory and Ion and Molecular Aggregation in Aqueous Solutions.
Choi, Jun-Ho; Lee, Hochan; Choi, Hyung Ran; Cho, Minhaeng
2018-04-20
In molecular and cellular biology, dissolved ions and molecules have decisive effects on chemical and biological reactions, conformational stabilities, and functions of small to large biomolecules. Despite major efforts, the current state of understanding of the effects of specific ions, osmolytes, and bioprotecting sugars on the structure and dynamics of water H-bonding networks and proteins is not yet satisfactory. Recently, to gain deeper insight into this subject, we studied various aggregation processes of ions and molecules in high-concentration salt, osmolyte, and sugar solutions with time-resolved vibrational spectroscopy and molecular dynamics simulation methods. It turns out that ions (or solute molecules) have a strong propensity to self-assemble into large and polydisperse aggregates that affect both local and long-range water H-bonding structures. In particular, we have shown that graph-theoretical approaches can be used to elucidate morphological characteristics of large aggregates in various aqueous salt, osmolyte, and sugar solutions. When ion and molecular aggregates in such aqueous solutions are treated as graphs, a variety of graph-theoretical properties, such as graph spectrum, degree distribution, clustering coefficient, minimum path length, and graph entropy, can be directly calculated by considering an ensemble of configurations taken from molecular dynamics trajectories. Here we show percolating behavior exhibited by ion and molecular aggregates upon increase in solute concentration in high solute concentrations and discuss compelling evidence of the isomorphic relation between percolation transitions of ion and molecular aggregates and water H-bonding networks. We anticipate that the combination of graph theory and molecular dynamics simulation methods will be of exceptional use in achieving a deeper understanding of the fundamental physical chemistry of dissolution and in describing the interplay between the self-aggregation of solute molecules and the structure and dynamics of water.
Graph Theory and Ion and Molecular Aggregation in Aqueous Solutions
NASA Astrophysics Data System (ADS)
Choi, Jun-Ho; Lee, Hochan; Choi, Hyung Ran; Cho, Minhaeng
2018-04-01
In molecular and cellular biology, dissolved ions and molecules have decisive effects on chemical and biological reactions, conformational stabilities, and functions of small to large biomolecules. Despite major efforts, the current state of understanding of the effects of specific ions, osmolytes, and bioprotecting sugars on the structure and dynamics of water H-bonding networks and proteins is not yet satisfactory. Recently, to gain deeper insight into this subject, we studied various aggregation processes of ions and molecules in high-concentration salt, osmolyte, and sugar solutions with time-resolved vibrational spectroscopy and molecular dynamics simulation methods. It turns out that ions (or solute molecules) have a strong propensity to self-assemble into large and polydisperse aggregates that affect both local and long-range water H-bonding structures. In particular, we have shown that graph-theoretical approaches can be used to elucidate morphological characteristics of large aggregates in various aqueous salt, osmolyte, and sugar solutions. When ion and molecular aggregates in such aqueous solutions are treated as graphs, a variety of graph-theoretical properties, such as graph spectrum, degree distribution, clustering coefficient, minimum path length, and graph entropy, can be directly calculated by considering an ensemble of configurations taken from molecular dynamics trajectories. Here we show percolating behavior exhibited by ion and molecular aggregates upon increase in solute concentration in high solute concentrations and discuss compelling evidence of the isomorphic relation between percolation transitions of ion and molecular aggregates and water H-bonding networks. We anticipate that the combination of graph theory and molecular dynamics simulation methods will be of exceptional use in achieving a deeper understanding of the fundamental physical chemistry of dissolution and in describing the interplay between the self-aggregation of solute molecules and the structure and dynamics of water.
Colson, B.E.; Ming, C.O.; Arcement, George J.
1979-01-01
Floodflow data that will provide a base for evaluating digital models relating to open-channel flow were obtained at 22 sites on streams in Alabama, Louisiana, and Mississippi. Thirty-five floods were measured. Analysis of the data indicated methods currently in use would be inaccurate where densely vegetated flood plains are crossed by highway embankments and single-opening bridges. This atlas presents flood information at the site on West Fork Amite River near Liberty, MS. Water depths , velocities, and discharges through bridge openings on West Fork Amite River near Liberty, MS for floods of December 6, 1971 , and March 25, 1973, are shown, together with peak water-surface elevations along embankments and along cross sections. Manning 's roughness coefficient values in different parts of the flood plain are shown on maps, and flood-frequency relations are shown on a graph. (USGS).
Structural Transitions in Densifying Networks
NASA Astrophysics Data System (ADS)
Lambiotte, R.; Krapivsky, P. L.; Bhat, U.; Redner, S.
2016-11-01
We introduce a minimal generative model for densifying networks in which a new node attaches to a randomly selected target node and also to each of its neighbors with probability p . The networks that emerge from this copying mechanism are sparse for p <1/2 and dense (average degree increasing with number of nodes N ) for p ≥1/2 . The behavior in the dense regime is especially rich; for example, individual network realizations that are built by copying are disparate and not self-averaging. Further, there is an infinite sequence of structural anomalies at p =2/3 , 3/4 , 4/5 , etc., where the N dependences of the number of triangles (3-cliques), 4-cliques, undergo phase transitions. When linking to second neighbors of the target can occur, the probability that the resulting graph is complete—all nodes are connected—is nonzero as N →∞ .
Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.
Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less
Zhang, Qin
2015-07-01
Probabilistic graphical models (PGMs) such as Bayesian network (BN) have been widely applied in uncertain causality representation and probabilistic reasoning. Dynamic uncertain causality graph (DUCG) is a newly presented model of PGMs, which can be applied to fault diagnosis of large and complex industrial systems, disease diagnosis, and so on. The basic methodology of DUCG has been previously presented, in which only the directed acyclic graph (DAG) was addressed. However, the mathematical meaning of DUCG was not discussed. In this paper, the DUCG with directed cyclic graphs (DCGs) is addressed. In contrast, BN does not allow DCGs, as otherwise the conditional independence will not be satisfied. The inference algorithm for the DUCG with DCGs is presented, which not only extends the capabilities of DUCG from DAGs to DCGs but also enables users to decompose a large and complex DUCG into a set of small, simple sub-DUCGs, so that a large and complex knowledge base can be easily constructed, understood, and maintained. The basic mathematical definition of a complete DUCG with or without DCGs is proved to be a joint probability distribution (JPD) over a set of random variables. The incomplete DUCG as a part of a complete DUCG may represent a part of JPD. Examples are provided to illustrate the methodology.
Laplacian Estrada and normalized Laplacian Estrada indices of evolving graphs.
Shang, Yilun
2015-01-01
Large-scale time-evolving networks have been generated by many natural and technological applications, posing challenges for computation and modeling. Thus, it is of theoretical and practical significance to probe mathematical tools tailored for evolving networks. In this paper, on top of the dynamic Estrada index, we study the dynamic Laplacian Estrada index and the dynamic normalized Laplacian Estrada index of evolving graphs. Using linear algebra techniques, we established general upper and lower bounds for these graph-spectrum-based invariants through a couple of intuitive graph-theoretic measures, including the number of vertices or edges. Synthetic random evolving small-world networks are employed to show the relevance of the proposed dynamic Estrada indices. It is found that neither the static snapshot graphs nor the aggregated graph can approximate the evolving graph itself, indicating the fundamental difference between the static and dynamic Estrada indices.
Analysis Tools for Interconnected Boolean Networks With Biological Applications.
Chaves, Madalena; Tournier, Laurent
2018-01-01
Boolean networks with asynchronous updates are a class of logical models particularly well adapted to describe the dynamics of biological networks with uncertain measures. The state space of these models can be described by an asynchronous state transition graph, which represents all the possible exits from every single state, and gives a global image of all the possible trajectories of the system. In addition, the asynchronous state transition graph can be associated with an absorbing Markov chain, further providing a semi-quantitative framework where it becomes possible to compute probabilities for the different trajectories. For large networks, however, such direct analyses become computationally untractable, given the exponential dimension of the graph. Exploiting the general modularity of biological systems, we have introduced the novel concept of asymptotic graph , computed as an interconnection of several asynchronous transition graphs and recovering all asymptotic behaviors of a large interconnected system from the behavior of its smaller modules. From a modeling point of view, the interconnection of networks is very useful to address for instance the interplay between known biological modules and to test different hypotheses on the nature of their mutual regulatory links. This paper develops two new features of this general methodology: a quantitative dimension is added to the asymptotic graph, through the computation of relative probabilities for each final attractor and a companion cross-graph is introduced to complement the method on a theoretical point of view.
Developing and evaluating Quilts for the depiction of large layered graphs.
Bae, Juhee; Watson, Ben
2011-12-01
Traditional layered graph depictions such as flow charts are in wide use. Yet as graphs grow more complex, these depictions can become difficult to understand. Quilts are matrix-based depictions for layered graphs designed to address this problem. In this research, we first improve Quilts by developing three design alternatives, and then compare the best of these alternatives to better-known node-link and matrix depictions. A primary weakness in Quilts is their depiction of skip links, links that do not simply connect to a succeeding layer. Therefore in our first study, we compare Quilts using color-only, text-only, and mixed (color and text) skip link depictions, finding that path finding with the color-only depiction is significantly slower and less accurate, and that in certain cases, the mixed depiction offers an advantage over the text-only depiction. In our second study, we compare Quilts using the mixed depiction to node-link diagrams and centered matrices. Overall results show that users can find paths through graphs significantly faster with Quilts (46.6 secs) than with node-link (58.3 secs) or matrix (71.2 secs) diagrams. This speed advantage is still greater in large graphs (e.g. in 200 node graphs, 55.4 secs vs. 71.1 secs for node-link and 84.2 secs for matrix depictions). © 2011 IEEE
Probabilistic generation of random networks taking into account information on motifs occurrence.
Bois, Frederic Y; Gayraud, Ghislaine
2015-01-01
Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli.
Probabilistic Generation of Random Networks Taking into Account Information on Motifs Occurrence
Bois, Frederic Y.
2015-01-01
Abstract Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli. PMID:25493547
Sketch Matching on Topology Product Graph.
Liang, Shuang; Luo, Jun; Liu, Wenyin; Wei, Yichen
2015-08-01
Sketch matching is the fundamental problem in sketch based interfaces. After years of study, it remains challenging when there exists large irregularity and variations in the hand drawn sketch shapes. While most existing works exploit topology relations and graph representations for this problem, they are usually limited by the coarse topology exploration and heuristic (thus suboptimal) similarity metrics between graphs. We present a new sketch matching method with two novel contributions. We introduce a comprehensive definition of topology relations, which results in a rich and informative graph representation of sketches. For graph matching, we propose topology product graph that retains the full correspondence for matching two graphs. Based on it, we derive an intuitive sketch similarity metric whose exact solution is easy to compute. In addition, the graph representation and new metric naturally support partial matching, an important practical problem that received less attention in the literature. Extensive experimental results on a real challenging dataset and the superior performance of our method show that it outperforms the state-of-the-art.
BioJS DAGViewer: A reusable JavaScript component for displaying directed graphs
Micklem, Gos
2014-01-01
Summary: The DAGViewer BioJS component is a reusable JavaScript component made available as part of the BioJS project and intended to be used to display graphs of structured data, with a particular emphasis on Directed Acyclic Graphs (DAGs). It enables users to embed representations of graphs of data, such as ontologies or phylogenetic trees, in hyper-text documents (HTML). This component is generic, since it is capable (given the appropriate configuration) of displaying any kind of data that is organised as a graph. The features of this component which are useful for examining and filtering large and complex graphs are described. Availability: http://github.com/alexkalderimis/dag-viewer-biojs; http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.8303. PMID:24627804
GRADIENT: Graph Analytic Approach for Discovering Irregular Events, Nascent and Temporal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie
2015-03-31
Finding a time-ordered signature within large graphs is a computationally complex problem due to the combinatorial explosion of potential patterns. GRADIENT is designed to search and understand that problem space.
GRADIENT: Graph Analytic Approach for Discovering Irregular Events, Nascent and Temporal
Hogan, Emilie
2018-01-16
Finding a time-ordered signature within large graphs is a computationally complex problem due to the combinatorial explosion of potential patterns. GRADIENT is designed to search and understand that problem space.
High performance semantic factoring of giga-scale semantic graph databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Adolf, Bob; Haglin, David
2010-10-01
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less
Nagoor Gani, A; Latha, S R
2016-01-01
A Hamiltonian cycle in a graph is a cycle that visits each node/vertex exactly once. A graph containing a Hamiltonian cycle is called a Hamiltonian graph. There have been several researches to find the number of Hamiltonian cycles of a Hamilton graph. As the number of vertices and edges grow, it becomes very difficult to keep track of all the different ways through which the vertices are connected. Hence, analysis of large graphs can be efficiently done with the assistance of a computer system that interprets graphs as matrices. And, of course, a good and well written algorithm will expedite the analysis even faster. The most convenient way to quickly test whether there is an edge between two vertices is to represent graphs using adjacent matrices. In this paper, a new algorithm is proposed to find fuzzy Hamiltonian cycle using adjacency matrix and the degree of the vertices of a fuzzy graph. A fuzzy graph structure is also modeled to illustrate the proposed algorithms with the selected air network of Indigo airlines.
Lombaert, Herve; Grady, Leo; Polimeni, Jonathan R.; Cheriet, Farida
2013-01-01
Existing methods for surface matching are limited by the trade-off between precision and computational efficiency. Here we present an improved algorithm for dense vertex-to-vertex correspondence that uses direct matching of features defined on a surface and improves it by using spectral correspondence as a regularization. This algorithm has the speed of both feature matching and spectral matching while exhibiting greatly improved precision (distance errors of 1.4%). The method, FOCUSR, incorporates implicitly such additional features to calculate the correspondence and relies on the smoothness of the lowest-frequency harmonics of a graph Laplacian to spatially regularize the features. In its simplest form, FOCUSR is an improved spectral correspondence method that nonrigidly deforms spectral embeddings. We provide here a full realization of spectral correspondence where virtually any feature can be used as additional information using weights on graph edges, but also on graph nodes and as extra embedded coordinates. As an example, the full power of FOCUSR is demonstrated in a real case scenario with the challenging task of brain surface matching across several individuals. Our results show that combining features and regularizing them in a spectral embedding greatly improves the matching precision (to a sub-millimeter level) while performing at much greater speed than existing methods. PMID:23868776
A Novel Coarsening Method for Scalable and Efficient Mesh Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A; Hysom, D; Gunney, B
2010-12-02
In this paper, we propose a novel mesh coarsening method called brick coarsening method. The proposed method can be used in conjunction with any graph partitioners and scales to very large meshes. This method reduces problem space by decomposing the original mesh into fixed-size blocks of nodes called bricks, layered in a similar way to conventional brick laying, and then assigning each node of the original mesh to appropriate brick. Our experiments indicate that the proposed method scales to very large meshes while allowing simple RCB partitioner to produce higher-quality partitions with significantly less edge cuts. Our results further indicatemore » that the proposed brick-coarsening method allows more complicated partitioners like PT-Scotch to scale to very large problem size while still maintaining good partitioning performance with relatively good edge-cut metric. Graph partitioning is an important problem that has many scientific and engineering applications in such areas as VLSI design, scientific computing, and resource management. Given a graph G = (V,E), where V is the set of vertices and E is the set of edges, (k-way) graph partitioning problem is to partition the vertices of the graph (V) into k disjoint groups such that each group contains roughly equal number of vertices and the number of edges connecting vertices in different groups is minimized. Graph partitioning plays a key role in large scientific computing, especially in mesh-based computations, as it is used as a tool to minimize the volume of communication and to ensure well-balanced load across computing nodes. The impact of graph partitioning on the reduction of communication can be easily seen, for example, in different iterative methods to solve a sparse system of linear equation. Here, a graph partitioning technique is applied to the matrix, which is basically a graph in which each edge is a non-zero entry in the matrix, to allocate groups of vertices to processors in such a way that many of matrix-vector multiplication can be performed locally on each processor and hence to minimize communication. Furthermore, a good graph partitioning scheme ensures the equal amount of computation performed on each processor. Graph partitioning is a well known NP-complete problem, and thus the most commonly used graph partitioning algorithms employ some forms of heuristics. These algorithms vary in terms of their complexity, partition generation time, and the quality of partitions, and they tend to trade off these factors. A significant challenge we are currently facing at the Lawrence Livermore National Laboratory is how to partition very large meshes on massive-size distributed memory machines like IBM BlueGene/P, where scalability becomes a big issue. For example, we have found that the ParMetis, a very popular graph partitioning tool, can only scale to 16K processors. An ideal graph partitioning method on such an environment should be fast and scale to very large meshes, while producing high quality partitions. This is an extremely challenging task, as to scale to that level, the partitioning algorithm should be simple and be able to produce partitions that minimize inter-processor communications and balance the load imposed on the processors. Our goals in this work are two-fold: (1) To develop a new scalable graph partitioning method with good load balancing and communication reduction capability. (2) To study the performance of the proposed partitioning method on very large parallel machines using actual data sets and compare the performance to that of existing methods. The proposed method achieves the desired scalability by reducing the mesh size. For this, it coarsens an input mesh into a smaller size mesh by coalescing the vertices and edges of the original mesh into a set of mega-vertices and mega-edges. A new coarsening method called brick algorithm is developed in this research. In the brick algorithm, the zones in a given mesh are first grouped into fixed size blocks called bricks. These brick are then laid in a way similar to conventional brick laying technique, which reduces the number of neighboring blocks each block needs to communicate. Contributions of this research are as follows: (1) We have developed a novel method that scales to a really large problem size while producing high quality mesh partitions; (2) We measured the performance and scalability of the proposed method on a machine of massive size using a set of actual large complex data sets, where we have scaled to a mesh with 110 million zones using our method. To the best of our knowledge, this is the largest complex mesh that a partitioning method is successfully applied to; and (3) We have shown that proposed method can reduce the number of edge cuts by as much as 65%.« less
Charlesworth, Paul; Kitzbichler, Manfred G.; Paulsen, Ole
2015-01-01
Recent studies demonstrated that the anatomical network of the human brain shows a “rich-club” organization. This complex topological feature implies that highly connected regions, hubs of the large-scale brain network, are more densely interconnected with each other than expected by chance. Rich-club nodes were traversed by a majority of short paths between peripheral regions, underlining their potential importance for efficient global exchange of information between functionally specialized areas of the brain. Network hubs have also been described at the microscale of brain connectivity (so-called “hub neurons”). Their role in shaping synchronous dynamics and forming microcircuit wiring during development, however, is not yet fully understood. The present study aimed to investigate the role of hubs during network development, using multi-electrode arrays and functional connectivity analysis during spontaneous multi-unit activity (MUA) of dissociated primary mouse hippocampal neurons. Over the first 4 weeks in vitro, functional connectivity significantly increased in strength, density, and size, with mature networks demonstrating a robust modular and small-world topology. As expected by a “rich-get-richer” growth rule of network evolution, MUA graphs were found to form rich-clubs at an early stage in development (14 DIV). Later on, rich-club nodes were a consistent topological feature of MUA graphs, demonstrating high nodal strength, efficiency, and centrality. Rich-club nodes were also found to be crucial for MUA dynamics. They often served as broker of spontaneous activity flow, confirming that hub nodes and rich-clubs may play an important role in coordinating functional dynamics at the microcircuit level. PMID:25855164
NEFI: Network Extraction From Images
Dirnberger, M.; Kehl, T.; Neumann, A.
2015-01-01
Networks are amongst the central building blocks of many systems. Given a graph of a network, methods from graph theory enable a precise investigation of its properties. Software for the analysis of graphs is widely available and has been applied to study various types of networks. In some applications, graph acquisition is relatively simple. However, for many networks data collection relies on images where graph extraction requires domain-specific solutions. Here we introduce NEFI, a tool that extracts graphs from images of networks originating in various domains. Regarding previous work on graph extraction, theoretical results are fully accessible only to an expert audience and ready-to-use implementations for non-experts are rarely available or insufficiently documented. NEFI provides a novel platform allowing practitioners to easily extract graphs from images by combining basic tools from image processing, computer vision and graph theory. Thus, NEFI constitutes an alternative to tedious manual graph extraction and special purpose tools. We anticipate NEFI to enable time-efficient collection of large datasets. The analysis of these novel datasets may open up the possibility to gain new insights into the structure and function of various networks. NEFI is open source and available at http://nefi.mpi-inf.mpg.de. PMID:26521675
NASA Astrophysics Data System (ADS)
Kase, Sue E.; Vanni, Michelle; Knight, Joanne A.; Su, Yu; Yan, Xifeng
2016-05-01
Within operational environments decisions must be made quickly based on the information available. Identifying an appropriate knowledge base and accurately formulating a search query are critical tasks for decision-making effectiveness in dynamic situations. The spreading of graph data management tools to access large graph databases is a rapidly emerging research area of potential benefit to the intelligence community. A graph representation provides a natural way of modeling data in a wide variety of domains. Graph structures use nodes, edges, and properties to represent and store data. This research investigates the advantages of information search by graph query initiated by the analyst and interactively refined within the contextual dimensions of the answer space toward a solution. The paper introduces SLQ, a user-friendly graph querying system enabling the visual formulation of schemaless and structureless graph queries. SLQ is demonstrated with an intelligence analyst information search scenario focused on identifying individuals responsible for manufacturing a mosquito-hosted deadly virus. The scenario highlights the interactive construction of graph queries without prior training in complex query languages or graph databases, intuitive navigation through the problem space, and visualization of results in graphical format.
JavaGenes: Evolving Graphs with Crossover
NASA Technical Reports Server (NTRS)
Globus, Al; Atsatt, Sean; Lawton, John; Wipke, Todd
2000-01-01
Genetic algorithms usually use string or tree representations. We have developed a novel crossover operator for a directed and undirected graph representation, and used this operator to evolve molecules and circuits. Unlike strings or trees, a single point in the representation cannot divide every possible graph into two parts, because graphs may contain cycles. Thus, the crossover operator is non-trivial. A steady-state, tournament selection genetic algorithm code (JavaGenes) was written to implement and test the graph crossover operator. All runs were executed by cycle-scavagging on networked workstations using the Condor batch processing system. The JavaGenes code has evolved pharmaceutical drug molecules and simple digital circuits. Results to date suggest that JavaGenes can evolve moderate sized drug molecules and very small circuits in reasonable time. The algorithm has greater difficulty with somewhat larger circuits, suggesting that directed graphs (circuits) are more difficult to evolve than undirected graphs (molecules), although necessary differences in the crossover operator may also explain the results. In principle, JavaGenes should be able to evolve other graph-representable systems, such as transportation networks, metabolic pathways, and computer networks. However, large graphs evolve significantly slower than smaller graphs, presumably because the space-of-all-graphs explodes combinatorially with graph size. Since the representation strongly affects genetic algorithm performance, adding graphs to the evolutionary programmer's bag-of-tricks should be beneficial. Also, since graph evolution operates directly on the phenotype, the genotype-phenotype translation step, common in genetic algorithm work, is eliminated.
A statistical analysis of UK financial networks
NASA Astrophysics Data System (ADS)
Chu, J.; Nadarajah, S.
2017-04-01
In recent years, with a growing interest in big or large datasets, there has been a rise in the application of large graphs and networks to financial big data. Much of this research has focused on the construction and analysis of the network structure of stock markets, based on the relationships between stock prices. Motivated by Boginski et al. (2005), who studied the characteristics of a network structure of the US stock market, we construct network graphs of the UK stock market using same method. We fit four distributions to the degree density of the vertices from these graphs, the Pareto I, Fréchet, lognormal, and generalised Pareto distributions, and assess the goodness of fit. Our results show that the degree density of the complements of the market graphs, constructed using a negative threshold value close to zero, can be fitted well with the Fréchet and lognormal distributions.
Browsing schematics: Query-filtered graphs with context nodes
NASA Technical Reports Server (NTRS)
Ciccarelli, Eugene C.; Nardi, Bonnie A.
1988-01-01
The early results of a research project to create tools for building interfaces to intelligent systems on the NASA Space Station are reported. One such tool is the Schematic Browser which helps users engaged in engineering problem solving find and select schematics from among a large set. Users query for schematics with certain components, and the Schematic Browser presents a graph whose nodes represent the schematics with those components. The query greatly reduces the number of choices presented to the user, filtering the graph to a manageable size. Users can reformulate and refine the query serially until they locate the schematics of interest. To help users maintain orientation as they navigate a large body of data, the graph also includes nodes that are not matches but provide global and local context for the matching nodes. Context nodes include landmarks, ancestors, siblings, children and previous matches.
Sampling Large Graphs for Anticipatory Analytics
2015-05-15
low. C. Random Area Sampling Random area sampling [8] is a “ snowball ” sampling method in which a set of random seed vertices are selected and areas... Sampling Large Graphs for Anticipatory Analytics Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller Lincoln...systems, greater human-in-the-loop involvement, or through complex algorithms. We are investigating the use of sampling to mitigate these challenges
Tractable Algorithms for Proximity Search on Large Graphs
2010-07-01
development in information retrieval, 2005. 5.1 164 A. K. Chandra, P. Raghavan, W. L. Ruzzo, and R. Smolensky. The electrical resistance of a graph captures...2007] show how to use hitting times for designing provably manipulation resistant reputation systems. Harmonic func- tions have been used for...commute times with electrical net- works (Doyle and Snell [1984]). Consider an undirected graph. Now think of each edge as a resistor with conductance
Graph State-Based Quantum Group Authentication Scheme
NASA Astrophysics Data System (ADS)
Liao, Longxia; Peng, Xiaoqi; Shi, Jinjing; Guo, Ying
2017-02-01
Motivated by the elegant structure of the graph state, we design an ingenious quantum group authentication scheme, which is implemented by operating appropriate operations on the graph state and can solve the problem of multi-user authentication. Three entities, the group authentication server (GAS) as a verifier, multiple users as provers and the trusted third party Trent are included. GAS and Trent assist the multiple users in completing the authentication process, i.e., GAS is responsible for registering all the users while Trent prepares graph states. All the users, who request for authentication, encode their authentication keys on to the graph state by performing Pauli operators. It demonstrates that a novel authentication scheme can be achieved with the flexible use of graph state, which can synchronously authenticate a large number of users, meanwhile the provable security can be guaranteed definitely.
A model of language inflection graphs
NASA Astrophysics Data System (ADS)
Fukś, Henryk; Farzad, Babak; Cao, Yi
2014-01-01
Inflection graphs are highly complex networks representing relationships between inflectional forms of words in human languages. For so-called synthetic languages, such as Latin or Polish, they have particularly interesting structure due to the abundance of inflectional forms. We construct the simplest form of inflection graphs, namely a bipartite graph in which one group of vertices corresponds to dictionary headwords and the other group to inflected forms encountered in a given text. We, then, study projection of this graph on the set of headwords. The projection decomposes into a large number of connected components, to be called word groups. Distribution of sizes of word group exhibits some remarkable properties, resembling cluster distribution in a lattice percolation near the critical point. We propose a simple model which produces graphs of this type, reproducing the desired component distribution and other topological features.
Dynamical modeling and analysis of large cellular regulatory networks
NASA Astrophysics Data System (ADS)
Bérenguier, D.; Chaouiya, C.; Monteiro, P. T.; Naldi, A.; Remy, E.; Thieffry, D.; Tichit, L.
2013-06-01
The dynamical analysis of large biological regulatory networks requires the development of scalable methods for mathematical modeling. Following the approach initially introduced by Thomas, we formalize the interactions between the components of a network in terms of discrete variables, functions, and parameters. Model simulations result in directed graphs, called state transition graphs. We are particularly interested in reachability properties and asymptotic behaviors, which correspond to terminal strongly connected components (or "attractors") in the state transition graph. A well-known problem is the exponential increase of the size of state transition graphs with the number of network components, in particular when using the biologically realistic asynchronous updating assumption. To address this problem, we have developed several complementary methods enabling the analysis of the behavior of large and complex logical models: (i) the definition of transition priority classes to simplify the dynamics; (ii) a model reduction method preserving essential dynamical properties, (iii) a novel algorithm to compact state transition graphs and directly generate compressed representations, emphasizing relevant transient and asymptotic dynamical properties. The power of an approach combining these different methods is demonstrated by applying them to a recent multilevel logical model for the network controlling CD4+ T helper cell response to antigen presentation and to a dozen cytokines. This model accounts for the differentiation of canonical Th1 and Th2 lymphocytes, as well as of inflammatory Th17 and regulatory T cells, along with many hybrid subtypes. All these methods have been implemented into the software GINsim, which enables the definition, the analysis, and the simulation of logical regulatory graphs.
NASA Technical Reports Server (NTRS)
Collins, Oliver (Inventor); Dolinar, Jr., Samuel J. (Inventor); Hus, In-Shek (Inventor); Bozzola, Fabrizio P. (Inventor); Olson, Erlend M. (Inventor); Statman, Joseph I. (Inventor); Zimmerman, George A. (Inventor)
1991-01-01
A method of formulating and packaging decision-making elements into a long constraint length Viterbi decoder which involves formulating the decision-making processors as individual Viterbi butterfly processors that are interconnected in a deBruijn graph configuration. A fully distributed architecture, which achieves high decoding speeds, is made feasible by novel wiring and partitioning of the state diagram. This partitioning defines universal modules, which can be used to build any size decoder, such that a large number of wires is contained inside each module, and a small number of wires is needed to connect modules. The total system is modular and hierarchical, and it implements a large proportion of the required wiring internally within modules and may include some external wiring to fully complete the deBruijn graph. pg,14.
Extended Graph-Based Models for Enhanced Similarity Search in Cavbase.
Krotzky, Timo; Fober, Thomas; Hüllermeier, Eyke; Klebe, Gerhard
2014-01-01
To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.
Graph theory and the Virasoro master equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Obers, N.A.J.
1991-04-01
A brief history of affine Lie algebra, the Virasoro algebra and its culmination in the Virasoro master equations is given. By studying ansaetze of the master equation, we obtain exact solutions and gain insight in the structure of large slices of affine-Virasoro space. We find an isomorphism between the constructions in the ansatz SO(n){sub diag}, which is a set of unitary, generically irrational affine-Virasoro constructions on SO(n), and the unlabelled graphs, while, conversely, a group-theoretic and conformal field-theoretic identification is obtained for every graph of graph theory. We also define a class of magic'' Lie group bases in which themore » Virasoro master equation admits a simple metric ansatz (gmetric), whose structure is visible in the high-level expansion. When a magic basis is real on compact g, the corresponding g{sub metric} is a large system of unitary, generically irrational conformal field theories. Examples in this class include the graph-theory ansatz SO(n){sub diag} in the Cartesian basis of SO(n), and the ansatz SU(n){sub metric} in the Pauli-like basis of SU(n). Finally, we define the sine-area graphs'' of SU(n), which label the conformal field theories of SU(n){sub metric}, and we note that, in similar fashion, each magic basis of g defines a generalized graph theory on g which labels the conformal field theories of g{sub metric}. 24 figs., 4 tabs.« less
Searching social networks for subgraph patterns
NASA Astrophysics Data System (ADS)
Ogaard, Kirk; Kase, Sue; Roy, Heather; Nagi, Rakesh; Sambhoos, Kedar; Sudit, Moises
2013-06-01
Software tools for Social Network Analysis (SNA) are being developed which support various types of analysis of social networks extracted from social media websites (e.g., Twitter). Once extracted and stored in a database such social networks are amenable to analysis by SNA software. This data analysis often involves searching for occurrences of various subgraph patterns (i.e., graphical representations of entities and relationships). The authors have developed the Graph Matching Toolkit (GMT) which provides an intuitive Graphical User Interface (GUI) for a heuristic graph matching algorithm called the Truncated Search Tree (TruST) algorithm. GMT is a visual interface for graph matching algorithms processing large social networks. GMT enables an analyst to draw a subgraph pattern by using a mouse to select categories and labels for nodes and links from drop-down menus. GMT then executes the TruST algorithm to find the top five occurrences of the subgraph pattern within the social network stored in the database. GMT was tested using a simulated counter-insurgency dataset consisting of cellular phone communications within a populated area of operations in Iraq. The results indicated GMT (when executing the TruST graph matching algorithm) is a time-efficient approach to searching large social networks. GMT's visual interface to a graph matching algorithm enables intelligence analysts to quickly analyze and summarize the large amounts of data necessary to produce actionable intelligence.
Robust Algorithms for on Minor-Free Graphs Based on the Sherali-Adams Hierarchy
NASA Astrophysics Data System (ADS)
Magen, Avner; Moharrami, Mohammad
This work provides a Linear Programming-based Polynomial Time Approximation Scheme (PTAS) for two classical NP-hard problems on graphs when the input graph is guaranteed to be planar, or more generally Minor Free. The algorithm applies a sufficiently large number (some function of when approximation is required) of rounds of the so-called Sherali-Adams Lift-and-Project system. needed to obtain a -approximation, where f is some function that depends only on the graph that should be avoided as a minor. The problem we discuss are the well-studied problems, the and problems. An curious fact we expose is that in the world of minor-free graph, the is harder in some sense than the.
A unified framework for building high performance DVEs
NASA Astrophysics Data System (ADS)
Lei, Kaibin; Ma, Zhixia; Xiong, Hua
2011-10-01
A unified framework for integrating PC cluster based parallel rendering with distributed virtual environments (DVEs) is presented in this paper. While various scene graphs have been proposed in DVEs, it is difficult to enable collaboration of different scene graphs. This paper proposes a technique for non-distributed scene graphs with the capability of object and event distribution. With the increase of graphics data, DVEs require more powerful rendering ability. But general scene graphs are inefficient in parallel rendering. The paper also proposes a technique to connect a DVE and a PC cluster based parallel rendering environment. A distributed multi-player video game is developed to show the interaction of different scene graphs and the parallel rendering performance on a large tiled display wall.
Scarselli, Franco; Tsoi, Ah Chung; Hagenbuchner, Markus; Noi, Lucia Di
2013-12-01
This paper proposes the combination of two state-of-the-art algorithms for processing graph input data, viz., the probabilistic mapping graph self organizing map, an unsupervised learning approach, and the graph neural network, a supervised learning approach. We organize these two algorithms in a cascade architecture containing a probabilistic mapping graph self organizing map, and a graph neural network. We show that this combined approach helps us to limit the long-term dependency problem that exists when training the graph neural network resulting in an overall improvement in performance. This is demonstrated in an application to a benchmark problem requiring the detection of spam in a relatively large set of web sites. It is found that the proposed method produces results which reach the state of the art when compared with some of the best results obtained by others using quite different approaches. A particular strength of our method is its applicability towards any input domain which can be represented as a graph. Copyright © 2013 Elsevier Ltd. All rights reserved.
Key-Node-Separated Graph Clustering and Layouts for Human Relationship Graph Visualization.
Itoh, Takayuki; Klein, Karsten
2015-01-01
Many graph-drawing methods apply node-clustering techniques based on the density of edges to find tightly connected subgraphs and then hierarchically visualize the clustered graphs. However, users may want to focus on important nodes and their connections to groups of other nodes for some applications. For this purpose, it is effective to separately visualize the key nodes detected based on adjacency and attributes of the nodes. This article presents a graph visualization technique for attribute-embedded graphs that applies a graph-clustering algorithm that accounts for the combination of connections and attributes. The graph clustering step divides the nodes according to the commonality of connected nodes and similarity of feature value vectors. It then calculates the distances between arbitrary pairs of clusters according to the number of connecting edges and the similarity of feature value vectors and finally places the clusters based on the distances. Consequently, the technique separates important nodes that have connections to multiple large clusters and improves the visibility of such nodes' connections. To test this technique, this article presents examples with human relationship graph datasets, including a coauthorship and Twitter communication network dataset.
Scaling Semantic Graph Databases in Size and Performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Villa, Oreste
In this paper we present SGEM, a full software system for accelerating large-scale semantic graph databases on commodity clusters. Unlike current approaches, SGEM addresses semantic graph databases by only employing graph methods at all the levels of the stack. On one hand, this allows exploiting the space efficiency of graph data structures and the inherent parallelism of graph algorithms. These features adapt well to the increasing system memory and core counts of modern commodity clusters. On the other hand, however, these systems are optimized for regular computation and batched data transfers, while graph methods usually are irregular and generate fine-grainedmore » data accesses with poor spatial and temporal locality. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods and a custom, multithreaded runtime system. We introduce our stack, motivate its advantages with respect to other solutions and show how we solved the challenges posed by irregular behaviors. We present the result of our software stack on the Berlin SPARQL benchmarks with datasets up to 10 billion triples (a triple corresponds to a graph edge), demonstrating scaling in dataset size and in performance as more nodes are added to the cluster.« less
MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.
He, Tiantian; Chan, Keith C C
2018-05-01
An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.
Are randomly grown graphs really random?
Callaway, D S; Hopcroft, J E; Kleinberg, J M; Newman, M E; Strogatz, S H
2001-10-01
We analyze a minimal model of a growing network. At each time step, a new vertex is added; then, with probability delta, two vertices are chosen uniformly at random and joined by an undirected edge. This process is repeated for t time steps. In the limit of large t, the resulting graph displays surprisingly rich characteristics. In particular, a giant component emerges in an infinite-order phase transition at delta=1/8. At the transition, the average component size jumps discontinuously but remains finite. In contrast, a static random graph with the same degree distribution exhibits a second-order phase transition at delta=1/4, and the average component size diverges there. These dramatic differences between grown and static random graphs stem from a positive correlation between the degrees of connected vertices in the grown graph-older vertices tend to have higher degree, and to link with other high-degree vertices, merely by virtue of their age. We conclude that grown graphs, however randomly they are constructed, are fundamentally different from their static random graph counterparts.
The BioMedical Evidence Graph (BMEG) | Informatics Technology for Cancer Research (ITCR)
The BMEG is a Cancer Data integration Platform that utilizes methods collected from DREAM challenges and applied to large datasets, such as the TCGA, and makes them avalible for analysis using a high performance graph database
Distributed-Memory Breadth-First Search on Massive Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buluc, Aydin; Beamer, Scott; Madduri, Kamesh
This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.
CrosstalkNet: A Visualization Tool for Differential Co-expression Networks and Communities.
Manem, Venkata; Adam, George Alexandru; Gruosso, Tina; Gigoux, Mathieu; Bertos, Nicholas; Park, Morag; Haibe-Kains, Benjamin
2018-04-15
Variations in physiological conditions can rewire molecular interactions between biological compartments, which can yield novel insights into gain or loss of interactions specific to perturbations of interest. Networks are a promising tool to elucidate intercellular interactions, yet exploration of these large-scale networks remains a challenge due to their high dimensionality. To retrieve and mine interactions, we developed CrosstalkNet, a user friendly, web-based network visualization tool that provides a statistical framework to infer condition-specific interactions coupled with a community detection algorithm for bipartite graphs to identify significantly dense subnetworks. As a case study, we used CrosstalkNet to mine a set of 54 and 22 gene-expression profiles from breast tumor and normal samples, respectively, with epithelial and stromal compartments extracted via laser microdissection. We show how CrosstalkNet can be used to explore large-scale co-expression networks and to obtain insights into the biological processes that govern cross-talk between different tumor compartments. Significance: This web application enables researchers to mine complex networks and to decipher novel biological processes in tumor epithelial-stroma cross-talk as well as in other studies of intercompartmental interactions. Cancer Res; 78(8); 2140-3. ©2018 AACR . ©2018 American Association for Cancer Research.
A Unified Framework for Street-View Panorama Stitching
Li, Li; Yao, Jian; Xie, Renping; Xia, Menghan; Zhang, Wei
2016-01-01
In this paper, we propose a unified framework to generate a pleasant and high-quality street-view panorama by stitching multiple panoramic images captured from the cameras mounted on the mobile platform. Our proposed framework is comprised of four major steps: image warping, color correction, optimal seam line detection and image blending. Since the input images are captured without a precisely common projection center from the scenes with the depth differences with respect to the cameras to different extents, such images cannot be precisely aligned in geometry. Therefore, an efficient image warping method based on the dense optical flow field is proposed to greatly suppress the influence of large geometric misalignment at first. Then, to lessen the influence of photometric inconsistencies caused by the illumination variations and different exposure settings, we propose an efficient color correction algorithm via matching extreme points of histograms to greatly decrease color differences between warped images. After that, the optimal seam lines between adjacent input images are detected via the graph cut energy minimization framework. At last, the Laplacian pyramid blending algorithm is applied to further eliminate the stitching artifacts along the optimal seam lines. Experimental results on a large set of challenging street-view panoramic images captured form the real world illustrate that the proposed system is capable of creating high-quality panoramas. PMID:28025481
2013-01-01
Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
An experimental study of graph connectivity for unsupervised word sense disambiguation.
Navigli, Roberto; Lapata, Mirella
2010-04-01
Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most "important" node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.
A system for routing arbitrary directed graphs on SIMD architectures
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl
1987-01-01
There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal.
What energy functions can be minimized via graph cuts?
Kolmogorov, Vladimir; Zabih, Ramin
2004-02-01
In the last few years, several new algorithms based on graph cuts have been developed to solve energy minimization problems in computer vision. Each of these techniques constructs a graph such that the minimum cut on the graph also minimizes the energy. Yet, because these graph constructions are complex and highly specific to a particular energy function, graph cuts have seen limited application to date. In this paper, we give a characterization of the energy functions that can be minimized by graph cuts. Our results are restricted to functions of binary variables. However, our work generalizes many previous constructions and is easily applicable to vision problems that involve large numbers of labels, such as stereo, motion, image restoration, and scene reconstruction. We give a precise characterization of what energy functions can be minimized using graph cuts, among the energy functions that can be written as a sum of terms containing three or fewer binary variables. We also provide a general-purpose construction to minimize such an energy function. Finally, we give a necessary condition for any energy function of binary variables to be minimized by graph cuts. Researchers who are considering the use of graph cuts to optimize a particular energy function can use our results to determine if this is possible and then follow our construction to create the appropriate graph. A software implementation is freely available.
Durand, Patrick; Labarre, Laurent; Meil, Alain; Divo, Jean-Louis; Vandenbrouck, Yves; Viari, Alain; Wojcik, Jérôme
2006-01-17
A large variety of biological data can be represented by graphs. These graphs can be constructed from heterogeneous data coming from genomic and post-genomic technologies, but there is still need for tools aiming at exploring and analysing such graphs. This paper describes GenoLink, a software platform for the graphical querying and exploration of graphs. GenoLink provides a generic framework for representing and querying data graphs. This framework provides a graph data structure, a graph query engine, allowing to retrieve sub-graphs from the entire data graph, and several graphical interfaces to express such queries and to further explore their results. A query consists in a graph pattern with constraints attached to the vertices and edges. A query result is the set of all sub-graphs of the entire data graph that are isomorphic to the pattern and satisfy the constraints. The graph data structure does not rely upon any particular data model but can dynamically accommodate for any user-supplied data model. However, for genomic and post-genomic applications, we provide a default data model and several parsers for the most popular data sources. GenoLink does not require any programming skill since all operations on graphs and the analysis of the results can be carried out graphically through several dedicated graphical interfaces. GenoLink is a generic and interactive tool allowing biologists to graphically explore various sources of information. GenoLink is distributed either as a standalone application or as a component of the Genostar/Iogma platform. Both distributions are free for academic research and teaching purposes and can be requested at academy@genostar.com. A commercial licence form can be obtained for profit company at info@genostar.com. See also http://www.genostar.org.
Durand, Patrick; Labarre, Laurent; Meil, Alain; Divo1, Jean-Louis; Vandenbrouck, Yves; Viari, Alain; Wojcik, Jérôme
2006-01-01
Background A large variety of biological data can be represented by graphs. These graphs can be constructed from heterogeneous data coming from genomic and post-genomic technologies, but there is still need for tools aiming at exploring and analysing such graphs. This paper describes GenoLink, a software platform for the graphical querying and exploration of graphs. Results GenoLink provides a generic framework for representing and querying data graphs. This framework provides a graph data structure, a graph query engine, allowing to retrieve sub-graphs from the entire data graph, and several graphical interfaces to express such queries and to further explore their results. A query consists in a graph pattern with constraints attached to the vertices and edges. A query result is the set of all sub-graphs of the entire data graph that are isomorphic to the pattern and satisfy the constraints. The graph data structure does not rely upon any particular data model but can dynamically accommodate for any user-supplied data model. However, for genomic and post-genomic applications, we provide a default data model and several parsers for the most popular data sources. GenoLink does not require any programming skill since all operations on graphs and the analysis of the results can be carried out graphically through several dedicated graphical interfaces. Conclusion GenoLink is a generic and interactive tool allowing biologists to graphically explore various sources of information. GenoLink is distributed either as a standalone application or as a component of the Genostar/Iogma platform. Both distributions are free for academic research and teaching purposes and can be requested at academy@genostar.com. A commercial licence form can be obtained for profit company at info@genostar.com. See also . PMID:16417636
Dynamical influence processes on networks: general theory and applications to social contagion.
Harris, Kameron Decker; Danforth, Christopher M; Dodds, Peter Sheridan
2013-08-01
We study binary state dynamics on a network where each node acts in response to the average state of its neighborhood. By allowing varying amounts of stochasticity in both the network and node responses, we find different outcomes in random and deterministic versions of the model. In the limit of a large, dense network, however, we show that these dynamics coincide. We construct a general mean-field theory for random networks and show this predicts that the dynamics on the network is a smoothed version of the average response function dynamics. Thus, the behavior of the system can range from steady state to chaotic depending on the response functions, network connectivity, and update synchronicity. As a specific example, we model the competing tendencies of imitation and nonconformity by incorporating an off-threshold into standard threshold models of social contagion. In this way, we attempt to capture important aspects of fashions and societal trends. We compare our theory to extensive simulations of this "limited imitation contagion" model on Poisson random graphs, finding agreement between the mean-field theory and stochastic simulations.
Indurkhya, Sagar; Beal, Jacob
2010-01-06
ODE simulations of chemical systems perform poorly when some of the species have extremely low concentrations. Stochastic simulation methods, which can handle this case, have been impractical for large systems due to computational complexity. We observe, however, that when modeling complex biological systems: (1) a small number of reactions tend to occur a disproportionately large percentage of the time, and (2) a small number of species tend to participate in a disproportionately large percentage of reactions. We exploit these properties in LOLCAT Method, a new implementation of the Gillespie Algorithm. First, factoring reaction propensities allows many propensities dependent on a single species to be updated in a single operation. Second, representing dependencies between reactions with a bipartite graph of reactions and species requires only storage for reactions, rather than the required for a graph that includes only reactions. Together, these improvements allow our implementation of LOLCAT Method to execute orders of magnitude faster than currently existing Gillespie Algorithm variants when simulating several yeast MAPK cascade models.
Indurkhya, Sagar; Beal, Jacob
2010-01-01
ODE simulations of chemical systems perform poorly when some of the species have extremely low concentrations. Stochastic simulation methods, which can handle this case, have been impractical for large systems due to computational complexity. We observe, however, that when modeling complex biological systems: (1) a small number of reactions tend to occur a disproportionately large percentage of the time, and (2) a small number of species tend to participate in a disproportionately large percentage of reactions. We exploit these properties in LOLCAT Method, a new implementation of the Gillespie Algorithm. First, factoring reaction propensities allows many propensities dependent on a single species to be updated in a single operation. Second, representing dependencies between reactions with a bipartite graph of reactions and species requires only storage for reactions, rather than the required for a graph that includes only reactions. Together, these improvements allow our implementation of LOLCAT Method to execute orders of magnitude faster than currently existing Gillespie Algorithm variants when simulating several yeast MAPK cascade models. PMID:20066048
Many-core graph analytics using accelerated sparse linear algebra routines
NASA Astrophysics Data System (ADS)
Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric
2016-05-01
Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis
2015-01-01
ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
Scale-free Graphs for General Aviation Flight Schedules
NASA Technical Reports Server (NTRS)
Alexandov, Natalia M. (Technical Monitor); Kincaid, Rex K.
2003-01-01
In the late 1990s a number of researchers noticed that networks in biology, sociology, and telecommunications exhibited similar characteristics unlike standard random networks. In particular, they found that the cummulative degree distributions of these graphs followed a power law rather than a binomial distribution and that their clustering coefficients tended to a nonzero constant as the number of nodes, n, became large rather than O(1/n). Moreover, these networks shared an important property with traditional random graphs as n becomes large the average shortest path length scales with log n. This latter property has been coined the small-world property. When taken together these three properties small-world, power law, and constant clustering coefficient describe what are now most commonly referred to as scale-free networks. Since 1997 at least six books and over 400 articles have been written about scale-free networks. In this manuscript an overview of the salient characteristics of scale-free networks. Computational experience will be provided for two mechanisms that grow (dynamic) scale-free graphs. Additional computational experience will be given for constructing (static) scale-free graphs via a tabu search optimization approach. Finally, a discussion of potential applications to general aviation networks is given.
NASA Astrophysics Data System (ADS)
Zhou, Lifan; Chai, Dengfeng; Xia, Yu; Ma, Peifeng; Lin, Hui
2018-01-01
Phase unwrapping (PU) is one of the key processes in reconstructing the digital elevation model of a scene from its interferometric synthetic aperture radar (InSAR) data. It is known that two-dimensional (2-D) PU problems can be formulated as maximum a posteriori estimation of Markov random fields (MRFs). However, considering that the traditional MRF algorithm is usually defined on a rectangular grid, it fails easily if large parts of the wrapped data are dominated by noise caused by large low-coherence area or rapid-topography variation. A PU solution based on sparse MRF is presented to extend the traditional MRF algorithm to deal with sparse data, which allows the unwrapping of InSAR data dominated by high phase noise. To speed up the graph cuts algorithm for sparse MRF, we designed dual elementary graphs and merged them to obtain the Delaunay triangle graph, which is used to minimize the energy function efficiently. The experiments on simulated and real data, compared with other existing algorithms, both confirm the effectiveness of the proposed MRF approach, which suffers less from decorrelation effects caused by large low-coherence area or rapid-topography variation.
Graph-based linear scaling electronic structure theory.
Niklasson, Anders M N; Mniszewski, Susan M; Negre, Christian F A; Cawkwell, Marc J; Swart, Pieter J; Mohd-Yusof, Jamal; Germann, Timothy C; Wall, Michael E; Bock, Nicolas; Rubensson, Emanuel H; Djidjev, Hristo
2016-06-21
We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations.
Graph-based linear scaling electronic structure theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niklasson, Anders M. N., E-mail: amn@lanl.gov; Negre, Christian F. A.; Cawkwell, Marc J.
2016-06-21
We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations.
Communication: Analysing kinetic transition networks for rare events.
Stevenson, Jacob D; Wales, David J
2014-07-28
The graph transformation approach is a recently proposed method for computing mean first passage times, rates, and committor probabilities for kinetic transition networks. Here we compare the performance to existing linear algebra methods, focusing on large, sparse networks. We show that graph transformation provides a much more robust framework, succeeding when numerical precision issues cause the other methods to fail completely. These are precisely the situations that correspond to rare event dynamics for which the graph transformation was introduced.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Visualization and recommendation of large image collections toward effective sensemaking
NASA Astrophysics Data System (ADS)
Gu, Yi; Wang, Chaoli; Nemiroff, Robert; Kao, David; Parra, Denis
2016-03-01
In our daily lives, images are among the most commonly found data which we need to handle. We present iGraph, a graph-based approach for visual analytics of large image collections and their associated text information. Given such a collection, we compute the similarity between images, the distance between texts, and the connection between image and text to construct iGraph, a compound graph representation which encodes the underlying relationships among these images and texts. To enable effective visual navigation and comprehension of iGraph with tens of thousands of nodes and hundreds of millions of edges, we present a progressive solution that offers collection overview, node comparison, and visual recommendation. Our solution not only allows users to explore the entire collection with representative images and keywords but also supports detailed comparison for understanding and intuitive guidance for navigation. The visual exploration of iGraph is further enhanced with the implementation of bubble sets to highlight group memberships of nodes, suggestion of abnormal keywords or time periods based on text outlier detection, and comparison of four different recommendation solutions. For performance speedup, multiple graphics processing units and central processing units are utilized for processing and visualization in parallel. We experiment with two image collections and leverage a cluster driving a display wall of nearly 50 million pixels. We show the effectiveness of our approach by demonstrating experimental results and conducting a user study.
Bond graph modelling of multibody dynamics and its symbolic scheme
NASA Astrophysics Data System (ADS)
Kawase, Takehiko; Yoshimura, Hiroaki
A bond graph method of modeling multibody dynamics is demonstrated. Specifically, a symbolic generation scheme which fully utilizes the bond graph information is presented. It is also demonstrated that structural understanding and representation in bond graph theory is quite powerful for the modeling of such large scale systems, and that the nonenergic multiport of junction structure, which is a multiport expression of the system structure, plays an important role, as first suggested by Paynter. The principal part of the proposed symbolic scheme, that is, the elimination of excess variables, is done through tearing and interconnection in the sense of Kron using newly defined causal and causal coefficient arrays.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Cluster Tails for Critical Power-Law Inhomogeneous Random Graphs
NASA Astrophysics Data System (ADS)
van der Hofstad, Remco; Kliem, Sandra; van Leeuwaarden, Johan S. H.
2018-04-01
Recently, the scaling limit of cluster sizes for critical inhomogeneous random graphs of rank-1 type having finite variance but infinite third moment degrees was obtained in Bhamidi et al. (Ann Probab 40:2299-2361, 2012). It was proved that when the degrees obey a power law with exponent τ \\in (3,4), the sequence of clusters ordered in decreasing size and multiplied through by n^{-(τ -2)/(τ -1)} converges as n→ ∞ to a sequence of decreasing non-degenerate random variables. Here, we study the tails of the limit of the rescaled largest cluster, i.e., the probability that the scaling limit of the largest cluster takes a large value u, as a function of u. This extends a related result of Pittel (J Combin Theory Ser B 82(2):237-269, 2001) for the Erdős-Rényi random graph to the setting of rank-1 inhomogeneous random graphs with infinite third moment degrees. We make use of delicate large deviations and weak convergence arguments.
Modelling disease outbreaks in realistic urban social networks
NASA Astrophysics Data System (ADS)
Eubank, Stephen; Guclu, Hasan; Anil Kumar, V. S.; Marathe, Madhav V.; Srinivasan, Aravind; Toroczkai, Zoltán; Wang, Nan
2004-05-01
Most mathematical models for the spread of disease use differential equations based on uniform mixing assumptions or ad hoc models for the contact process. Here we explore the use of dynamic bipartite graphs to model the physical contact patterns that result from movements of individuals between specific locations. The graphs are generated by large-scale individual-based urban traffic simulations built on actual census, land-use and population-mobility data. We find that the contact network among people is a strongly connected small-world-like graph with a well-defined scale for the degree distribution. However, the locations graph is scale-free, which allows highly efficient outbreak detection by placing sensors in the hubs of the locations network. Within this large-scale simulation framework, we then analyse the relative merits of several proposed mitigation strategies for smallpox spread. Our results suggest that outbreaks can be contained by a strategy of targeted vaccination combined with early detection without resorting to mass vaccination of a population.
Doherty, Irene A; Serre, Marc L; Gesink, Dionne; Adimora, Adaora A; Muth, Stephen Q; Leone, Peter A; Miller, William C
2012-11-01
Sexually transmitted infections (STIs) spread along sexual networks whose structural characteristics promote transmission that routine surveillance may not capture. Cases who have partners from multiple localities may operate as spatial network bridges, thereby facilitating geographical dissemination. We investigated how surveillance, sexual networks, and spatial bridges relate to each other for syphilis outbreaks in rural counties of North Carolina. We selected from the state health department's surveillance database cases diagnosed with primary, secondary, or early latent syphilis during October 1998 to December 2002 and who resided in central and southeastern North Carolina, along with their sex partners and their social contacts irrespective of infection status. We applied matching algorithms to eliminate duplicate names and create a unique roster of partnerships from which networks were compiled and graphed. Network members were differentiated by disease status and county of residence. In the county most affected by the outbreak, densely connected networks indicative of STI outbreaks were consistent with increased incidence and a large case load. In other counties, the case loads were low with fluctuating incidence, but network structures suggested the presence of outbreaks. In a county with stable, low incidence and a high number of cases, the networks were sparse and dendritic, indicative of endemic spread. Outbreak counties exhibited densely connected networks within well-defined geographic boundaries and low connectivity between counties; spatial bridges did not seem to facilitate transmission. Simple visualization of sexual networks can provide key information to identify communities most in need of resources for outbreak investigation and disease control.
Graph theory and the Virasoro master equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Obers, N.A.J.
1991-01-01
A brief history of affine Lie algebra, the Virasoro algebra and its culmination in the Virasoro master equation is given. By studying ansaetze of the master equation, the author obtains exact solutions and gains insight in the structure of large slices of affine-Virasoro space. He finds an isomorphism between the constructions in the ansatz SO(n){sub diag}, which is a set of unitary, generically irrational affine-Virasoro constructions on SO(n), and the unlabeled graphs of order n. On the one hand, the conformal constructions, are classified by the graphs, while, conversely, a group-theoretic and conformal field-theoretic identification is obtained for every graphmore » of graph theory. He also defines a class of magic Lie group bases in which the Virasoro master equation admits a simple metric ansatz {l brace}g{sub metric}{r brace}, whose structure is visible in the high-level expansion. When a magic basis is real on compact g, the corresponding g{sub metric} is a large system of unitary, generically irrational conformal field theories. Examples in this class include the graph-theory ansatz SO(n){sub diag} in the Cartesian basis of SO(n), and the ansatz SU(n){sub metric} in the Pauli-like basis of SU(n). Finally, he defines the sine-area graphs' of SU(n), which label the conformal field theories of SU(n){sub metric}, and he notes that, in similar fashion, each magic basis of g defines a generalized graph theory on g which labels the conformal field theories of g{sub metric}.« less
Network reconstruction via graph blending
NASA Astrophysics Data System (ADS)
Estrada, Rolando
2016-05-01
Graphs estimated from empirical data are often noisy and incomplete due to the difficulty of faithfully observing all the components (nodes and edges) of the true graph. This problem is particularly acute for large networks where the number of components may far exceed available surveillance capabilities. Errors in the observed graph can render subsequent analyses invalid, so it is vital to develop robust methods that can minimize these observational errors. Errors in the observed graph may include missing and spurious components, as well fused (multiple nodes are merged into one) and split (a single node is misinterpreted as many) nodes. Traditional graph reconstruction methods are only able to identify missing or spurious components (primarily edges, and to a lesser degree nodes), so we developed a novel graph blending framework that allows us to cast the full estimation problem as a simple edge addition/deletion problem. Armed with this framework, we systematically investigate the viability of various topological graph features, such as the degree distribution or the clustering coefficients, and existing graph reconstruction methods for tackling the full estimation problem. Our experimental results suggest that incorporating any topological feature as a source of information actually hinders reconstruction accuracy. We provide a theoretical analysis of this phenomenon and suggest several avenues for improving this estimation problem.
NASA Astrophysics Data System (ADS)
Ziemann, Amanda K.; Messinger, David W.; Albano, James A.; Basener, William F.
2012-06-01
Anomaly detection algorithms have historically been applied to hyperspectral imagery in order to identify pixels whose material content is incongruous with the background material in the scene. Typically, the application involves extracting man-made objects from natural and agricultural surroundings. A large challenge in designing these algorithms is determining which pixels initially constitute the background material within an image. The topological anomaly detection (TAD) algorithm constructs a graph theory-based, fully non-parametric topological model of the background in the image scene, and uses codensity to measure deviation from this background. In TAD, the initial graph theory structure of the image data is created by connecting an edge between any two pixel vertices x and y if the Euclidean distance between them is less than some resolution r. While this type of proximity graph is among the most well-known approaches to building a geometric graph based on a given set of data, there is a wide variety of dierent geometrically-based techniques. In this paper, we present a comparative test of the performance of TAD across four dierent constructs of the initial graph: mutual k-nearest neighbor graph, sigma-local graph for two different values of σ > 1, and the proximity graph originally implemented in TAD.
Evolutionary Games of Multiplayer Cooperation on Graphs
Arranz, Jordi; Traulsen, Arne
2016-01-01
There has been much interest in studying evolutionary games in structured populations, often modeled as graphs. However, most analytical results so far have only been obtained for two-player or linear games, while the study of more complex multiplayer games has been usually tackled by computer simulations. Here we investigate evolutionary multiplayer games on graphs updated with a Moran death-Birth process. For cycles, we obtain an exact analytical condition for cooperation to be favored by natural selection, given in terms of the payoffs of the game and a set of structure coefficients. For regular graphs of degree three and larger, we estimate this condition using a combination of pair approximation and diffusion approximation. For a large class of cooperation games, our approximations suggest that graph-structured populations are stronger promoters of cooperation than populations lacking spatial structure. Computer simulations validate our analytical approximations for random regular graphs and cycles, but show systematic differences for graphs with many loops such as lattices. In particular, our simulation results show that these kinds of graphs can even lead to more stringent conditions for the evolution of cooperation than well-mixed populations. Overall, we provide evidence suggesting that the complexity arising from many-player interactions and spatial structure can be captured by pair approximation in the case of random graphs, but that it need to be handled with care for graphs with high clustering. PMID:27513946
Communication requirements of sparse Cholesky factorization with nested dissection ordering
NASA Technical Reports Server (NTRS)
Naik, Vijay K.; Patrick, Merrell L.
1989-01-01
Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.
Discriminative graph embedding for label propagation.
Nguyen, Canh Hao; Mamitsuka, Hiroshi
2011-09-01
In many applications, the available information is encoded in graph structures. This is a common problem in biological networks, social networks, web communities and document citations. We investigate the problem of classifying nodes' labels on a similarity graph given only a graph structure on the nodes. Conventional machine learning methods usually require data to reside in some Euclidean spaces or to have a kernel representation. Applying these methods to nodes on graphs would require embedding the graphs into these spaces. By embedding and then learning the nodes on graphs, most methods are either flexible with different learning objectives or efficient enough for large scale applications. We propose a method to embed a graph into a feature space for a discriminative purpose. Our idea is to include label information into the embedding process, making the space representation tailored to the task. We design embedding objective functions that the following learning formulations become spectral transforms. We then reformulate these spectral transforms into multiple kernel learning problems. Our method, while being tailored to the discriminative tasks, is efficient and can scale to massive data sets. We show the need of discriminative embedding on some simulations. Applying to biological network problems, our method is shown to outperform baselines.
Elmetwaly, Shereef; Schlick, Tamar
2014-01-01
Graph representations have been widely used to analyze and design various economic, social, military, political, and biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize mathematical properties of the Laplacian eigenvector (µ2) corresponding to the second eigenvalues (λ2) associated with the topology matrix defining the graph: λ2 describes the overall topology, and the sum of µ2′s components is zero. The three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, and largest gap of µ2′s components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA structures up through 11 vertices (∼220 nucleotides). While we observe that the median cut divides a graph into two similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs. PMID:25188578
Digital Museum of Retinal Ganglion Cells with Dense Anatomy and Physiology.
Bae, J Alexander; Mu, Shang; Kim, Jinseop S; Turner, Nicholas L; Tartavull, Ignacio; Kemnitz, Nico; Jordan, Chris S; Norton, Alex D; Silversmith, William M; Prentki, Rachel; Sorek, Marissa; David, Celia; Jones, Devon L; Bland, Doug; Sterling, Amy L R; Park, Jungman; Briggman, Kevin L; Seung, H Sebastian
2018-05-17
When 3D electron microscopy and calcium imaging are used to investigate the structure and function of neural circuits, the resulting datasets pose new challenges of visualization and interpretation. Here, we present a new kind of digital resource that encompasses almost 400 ganglion cells from a single patch of mouse retina. An online "museum" provides a 3D interactive view of each cell's anatomy, as well as graphs of its visual responses. The resource reveals two aspects of the retina's inner plexiform layer: an arbor segregation principle governing structure along the light axis and a density conservation principle governing structure in the tangential plane. Structure is related to visual function; ganglion cells with arbors near the layer of ganglion cell somas are more sustained in their visual responses on average. Our methods are potentially applicable to dense maps of neuronal anatomy and physiology in other parts of the nervous system. Copyright © 2018 Elsevier Inc. All rights reserved.
Schroeter, Manuel S; Charlesworth, Paul; Kitzbichler, Manfred G; Paulsen, Ole; Bullmore, Edward T
2015-04-08
Recent studies demonstrated that the anatomical network of the human brain shows a "rich-club" organization. This complex topological feature implies that highly connected regions, hubs of the large-scale brain network, are more densely interconnected with each other than expected by chance. Rich-club nodes were traversed by a majority of short paths between peripheral regions, underlining their potential importance for efficient global exchange of information between functionally specialized areas of the brain. Network hubs have also been described at the microscale of brain connectivity (so-called "hub neurons"). Their role in shaping synchronous dynamics and forming microcircuit wiring during development, however, is not yet fully understood. The present study aimed to investigate the role of hubs during network development, using multi-electrode arrays and functional connectivity analysis during spontaneous multi-unit activity (MUA) of dissociated primary mouse hippocampal neurons. Over the first 4 weeks in vitro, functional connectivity significantly increased in strength, density, and size, with mature networks demonstrating a robust modular and small-world topology. As expected by a "rich-get-richer" growth rule of network evolution, MUA graphs were found to form rich-clubs at an early stage in development (14 DIV). Later on, rich-club nodes were a consistent topological feature of MUA graphs, demonstrating high nodal strength, efficiency, and centrality. Rich-club nodes were also found to be crucial for MUA dynamics. They often served as broker of spontaneous activity flow, confirming that hub nodes and rich-clubs may play an important role in coordinating functional dynamics at the microcircuit level. Copyright © 2015 the authors 0270-6474/15/355459-12$15.00/0.
Memory-Efficient Analysis of Dense Functional Connectomes.
Loewe, Kristian; Donohue, Sarah E; Schoenfeld, Mircea A; Kruse, Rudolf; Borgelt, Christian
2016-01-01
The functioning of the human brain relies on the interplay and integration of numerous individual units within a complex network. To identify network configurations characteristic of specific cognitive tasks or mental illnesses, functional connectomes can be constructed based on the assessment of synchronous fMRI activity at separate brain sites, and then analyzed using graph-theoretical concepts. In most previous studies, relatively coarse parcellations of the brain were used to define regions as graphical nodes. Such parcellated connectomes are highly dependent on parcellation quality because regional and functional boundaries need to be relatively consistent for the results to be interpretable. In contrast, dense connectomes are not subject to this limitation, since the parcellation inherent to the data is used to define graphical nodes, also allowing for a more detailed spatial mapping of connectivity patterns. However, dense connectomes are associated with considerable computational demands in terms of both time and memory requirements. The memory required to explicitly store dense connectomes in main memory can render their analysis infeasible, especially when considering high-resolution data or analyses across multiple subjects or conditions. Here, we present an object-based matrix representation that achieves a very low memory footprint by computing matrix elements on demand instead of explicitly storing them. In doing so, memory required for a dense connectome is reduced to the amount needed to store the underlying time series data. Based on theoretical considerations and benchmarks, different matrix object implementations and additional programs (based on available Matlab functions and Matlab-based third-party software) are compared with regard to their computational efficiency. The matrix implementation based on on-demand computations has very low memory requirements, thus enabling analyses that would be otherwise infeasible to conduct due to insufficient memory. An open source software package containing the created programs is available for download.
Memory-Efficient Analysis of Dense Functional Connectomes
Loewe, Kristian; Donohue, Sarah E.; Schoenfeld, Mircea A.; Kruse, Rudolf; Borgelt, Christian
2016-01-01
The functioning of the human brain relies on the interplay and integration of numerous individual units within a complex network. To identify network configurations characteristic of specific cognitive tasks or mental illnesses, functional connectomes can be constructed based on the assessment of synchronous fMRI activity at separate brain sites, and then analyzed using graph-theoretical concepts. In most previous studies, relatively coarse parcellations of the brain were used to define regions as graphical nodes. Such parcellated connectomes are highly dependent on parcellation quality because regional and functional boundaries need to be relatively consistent for the results to be interpretable. In contrast, dense connectomes are not subject to this limitation, since the parcellation inherent to the data is used to define graphical nodes, also allowing for a more detailed spatial mapping of connectivity patterns. However, dense connectomes are associated with considerable computational demands in terms of both time and memory requirements. The memory required to explicitly store dense connectomes in main memory can render their analysis infeasible, especially when considering high-resolution data or analyses across multiple subjects or conditions. Here, we present an object-based matrix representation that achieves a very low memory footprint by computing matrix elements on demand instead of explicitly storing them. In doing so, memory required for a dense connectome is reduced to the amount needed to store the underlying time series data. Based on theoretical considerations and benchmarks, different matrix object implementations and additional programs (based on available Matlab functions and Matlab-based third-party software) are compared with regard to their computational efficiency. The matrix implementation based on on-demand computations has very low memory requirements, thus enabling analyses that would be otherwise infeasible to conduct due to insufficient memory. An open source software package containing the created programs is available for download. PMID:27965565
2013-01-01
5, pp. 75–174, 2010. [2] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney , “Statistical properties of community structure in large social and...2011. [14] R. R. Nadakuditi and M. Newman , “Graph spectra and the detectability of community structure in networks,” Phys. Rev. Lett., vol. 108, no
Islands and Bridges: Making Sense of Marked Nodes in Large Graphs
2012-08-01
time-evolving graphs. References [1] Nouf M. Kh. Alsudairy, Vijay V. Raghavan, Alaaeldin M. Hafez, and Hassan I Mathkour. Connection subgraphs: A survey...Riedy, David A. Bader, Karl Jiang, Pushkar Pande, , and Richa Sharma . Detecting communities from given seeds in social networks. Technical Report GT-CSE
Perkins, David Nikolaus; Brost, Randolph; Ray, Lawrence P.
2017-08-08
Various technologies for facilitating analysis of large remote sensing and geolocation datasets to identify features of interest are described herein. A search query can be submitted to a computing system that executes searches over a geospatial temporal semantic (GTS) graph to identify features of interest. The GTS graph comprises nodes corresponding to objects described in the remote sensing and geolocation datasets, and edges that indicate geospatial or temporal relationships between pairs of nodes in the nodes. Trajectory information is encoded in the GTS graph by the inclusion of movable nodes to facilitate searches for features of interest in the datasets relative to moving objects such as vehicles.
VISAGE: Interactive Visual Graph Querying.
Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng
2016-06-01
Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete , an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with "wildcard" nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE's ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries.
VISAGE: Interactive Visual Graph Querying
Pienta, Robert; Navathe, Shamkant; Tamersoy, Acar; Tong, Hanghang; Endert, Alex; Chau, Duen Horng
2017-01-01
Extracting useful patterns from large network datasets has become a fundamental challenge in many domains. We present VISAGE, an interactive visual graph querying approach that empowers users to construct expressive queries, without writing complex code (e.g., finding money laundering rings of bankers and business owners). Our contributions are as follows: (1) we introduce graph autocomplete, an interactive approach that guides users to construct and refine queries, preventing over-specification; (2) VISAGE guides the construction of graph queries using a data-driven approach, enabling users to specify queries with varying levels of specificity, from concrete and detailed (e.g., query by example), to abstract (e.g., with “wildcard” nodes of any types), to purely structural matching; (3) a twelve-participant, within-subject user study demonstrates VISAGE’s ease of use and the ability to construct graph queries significantly faster than using a conventional query language; (4) VISAGE works on real graphs with over 468K edges, achieving sub-second response times for common queries. PMID:28553670
Sudden emergence of q-regular subgraphs in random graphs
NASA Astrophysics Data System (ADS)
Pretti, M.; Weigt, M.
2006-07-01
We investigate the computationally hard problem whether a random graph of finite average vertex degree has an extensively large q-regular subgraph, i.e., a subgraph with all vertices having degree equal to q. We reformulate this problem as a constraint-satisfaction problem, and solve it using the cavity method of statistical physics at zero temperature. For q = 3, we find that the first large q-regular subgraphs appear discontinuously at an average vertex degree c3 - reg simeq 3.3546 and contain immediately about 24% of all vertices in the graph. This transition is extremely close to (but different from) the well-known 3-core percolation point c3 - core simeq 3.3509. For q > 3, the q-regular subgraph percolation threshold is found to coincide with that of the q-core.
BFL: a node and edge betweenness based fast layout algorithm for large scale networks
Hashimoto, Tatsunori B; Nagasaki, Masao; Kojima, Kaname; Miyano, Satoru
2009-01-01
Background Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements. Results To overcome these problems, we propose the use of a biologically important graph metric, betweenness, a measure of network flow. This metric is highly correlated with many biological phenomena such as lethality and clusters. We devise a new fast parallel algorithm calculating betweenness to minimize the preprocessing cost. Using this metric, we also invent a node and edge betweenness based fast layout algorithm (BFL). BFL places the high-betweenness nodes to optimal positions and allows the low-betweenness nodes to reach suboptimal positions. Furthermore, BFL reduces the runtime by combining a sequential insertion algorim with betweenness. For a graph with n nodes, this approach reduces the expected runtime of the algorithm to O(n2) when considering edge crossings, and to O(n log n) when considering only density and edge lengths. Conclusion Our BFL algorithm is compared against fast graph layout algorithms and approaches requiring intensive optimizations. For gene networks, we show that our algorithm is faster than all layout algorithms tested while providing readability on par with intensive optimization algorithms. We achieve a 1.4 second runtime for a graph with 4000 nodes and 12000 edges on a standard desktop computer. PMID:19146673
Building Scalable Knowledge Graphs for Earth Science
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Maskey, M.; Gatlin, P. N.; Zhang, J.; Duan, X.; Bugbee, K.; Christopher, S. A.; Miller, J. J.
2017-12-01
Estimates indicate that the world's information will grow by 800% in the next five years. In any given field, a single researcher or a team of researchers cannot keep up with this rate of knowledge expansion without the help of cognitive systems. Cognitive computing, defined as the use of information technology to augment human cognition, can help tackle large systemic problems. Knowledge graphs, one of the foundational components of cognitive systems, link key entities in a specific domain with other entities via relationships. Researchers could mine these graphs to make probabilistic recommendations and to infer new knowledge. At this point, however, there is a dearth of tools to generate scalable Knowledge graphs using existing corpus of scientific literature for Earth science research. Our project is currently developing an end-to-end automated methodology for incrementally constructing Knowledge graphs for Earth Science. Semantic Entity Recognition (SER) is one of the key steps in this methodology. SER for Earth Science uses external resources (including metadata catalogs and controlled vocabulary) as references to guide entity extraction and recognition (i.e., labeling) from unstructured text, in order to build a large training set to seed the subsequent auto-learning component in our algorithm. Results from several SER experiments will be presented as well as lessons learned.
Jing, Xia; Cimino, James J.
2011-01-01
Objective: To explore new graphical methods for reducing and analyzing large data sets in which the data are coded with a hierarchical terminology. Methods: We use a hierarchical terminology to organize a data set and display it in a graph. We reduce the size and complexity of the data set by considering the terminological structure and the data set itself (using a variety of thresholds) as well as contributions of child level nodes to parent level nodes. Results: We found that our methods can reduce large data sets to manageable size and highlight the differences among graphs. The thresholds used as filters to reduce the data set can be used alone or in combination. We applied our methods to two data sets containing information about how nurses and physicians query online knowledge resources. The reduced graphs make the differences between the two groups readily apparent. Conclusions: This is a new approach to reduce size and complexity of large data sets and to simplify visualization. This approach can be applied to any data sets that are coded with hierarchical terminologies. PMID:22195119
Inferring ontology graph structures using OWL reasoning.
Rodríguez-García, Miguel Ángel; Hoehndorf, Robert
2018-01-05
Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
2011-01-01
Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466
Graph embedding and extensions: a general framework for dimensionality reduction.
Yan, Shuicheng; Xu, Dong; Zhang, Benyu; Zhang, Hong-Jiang; Yang, Qiang; Lin, Stephen
2007-01-01
Over the past few decades, a large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different solutions to the problem of dimensionality reduction. Despite the different motivations of these algorithms, we present in this paper a general formulation known as graph embedding to unify them within a common framework. In graph embedding, each algorithm can be considered as the direct graph embedding or its linear/kernel/tensor extension of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph that characterizes a statistical or geometric property that should be avoided. Furthermore, the graph embedding framework can be used as a general platform for developing new dimensionality reduction algorithms. By utilizing this framework as a tool, we propose a new supervised dimensionality reduction algorithm called Marginal Fisher Analysis in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizes the interclass separability. We show that MFA effectively overcomes the limitations of the traditional Linear Discriminant Analysis algorithm due to data distribution assumptions and available projection directions. Real face recognition experiments show the superiority of our proposed MFA in comparison to LDA, also for corresponding kernel and tensor extensions.
GOGrapher: A Python library for GO graph representation and analysis.
Muller, Brian; Richards, Adam J; Jin, Bo; Lu, Xinghua
2009-07-07
The Gene Ontology is the most commonly used controlled vocabulary for annotating proteins. The concepts in the ontology are organized as a directed acyclic graph, in which a node corresponds to a biological concept and a directed edge denotes the parent-child semantic relationship between a pair of terms. A large number of protein annotations further create links between proteins and their functional annotations, reflecting the contemporary knowledge about proteins and their functional relationships. This leads to a complex graph consisting of interleaved biological concepts and their associated proteins. What is needed is a simple, open source library that provides tools to not only create and view the Gene Ontology graph, but to analyze and manipulate it as well. Here we describe the development and use of GOGrapher, a Python library that can be used for the creation, analysis, manipulation, and visualization of Gene Ontology related graphs. An object-oriented approach was adopted to organize the hierarchy of the graphs types and associated classes. An Application Programming Interface is provided through which different types of graphs can be pragmatically created, manipulated, and visualized. GOGrapher has been successfully utilized in multiple research projects, e.g., a graph-based multi-label text classifier for protein annotation. The GOGrapher project provides a reusable programming library designed for the manipulation and analysis of Gene Ontology graphs. The library is freely available for the scientific community to use and improve.
Zheng, Qiang; Warner, Steven; Tasian, Gregory; Fan, Yong
2018-02-12
Automatic segmentation of kidneys in ultrasound (US) images remains a challenging task because of high speckle noise, low contrast, and large appearance variations of kidneys in US images. Because texture features may improve the US image segmentation performance, we propose a novel graph cuts method to segment kidney in US images by integrating image intensity information and texture feature maps. We develop a new graph cuts-based method to segment kidney US images by integrating original image intensity information and texture feature maps extracted using Gabor filters. To handle large appearance variation within kidney images and improve computational efficiency, we build a graph of image pixels close to kidney boundary instead of building a graph of the whole image. To make the kidney segmentation robust to weak boundaries, we adopt localized regional information to measure similarity between image pixels for computing edge weights to build the graph of image pixels. The localized graph is dynamically updated and the graph cuts-based segmentation iteratively progresses until convergence. Our method has been evaluated based on kidney US images of 85 subjects. The imaging data of 20 randomly selected subjects were used as training data to tune parameters of the image segmentation method, and the remaining data were used as testing data for validation. Experiment results demonstrated that the proposed method obtained promising segmentation results for bilateral kidneys (average Dice index = 0.9446, average mean distance = 2.2551, average specificity = 0.9971, average accuracy = 0.9919), better than other methods under comparison (P < .05, paired Wilcoxon rank sum tests). The proposed method achieved promising performance for segmenting kidneys in two-dimensional US images, better than segmentation methods built on any single channel of image information. This method will facilitate extraction of kidney characteristics that may predict important clinical outcomes such as progression of chronic kidney disease. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Kearney, K.; Aydin, K.
2016-02-01
Oceanic food webs are often depicted as network graphs, with the major organisms or functional groups displayed as nodes and the fluxes of between them as the edges. However, the large number of nodes and edges and high connectance of many management-oriented food webs coupled with graph layout algorithms poorly-suited to certain desired characteristics of food web visualizations often lead to hopelessly tangled diagrams that convey little information other than, "It's complex." Here, I combine several new graph visualization techniques- including a new node layout alorithm based on a trophic similarity (quantification of shared predator and prey) and trophic level, divided edge bundling for edge routing, and intelligent automated placement of labels- to create a much clearer visualization of the important fluxes through a food web. The technique will be used to highlight the differences in energy flow within three Alaskan Large Marine Ecosystems (the Bering Sea, Gulf of Alaska, and Aleutian Islands) that include very similar functional groups but unique energy pathways.
High Performance Descriptive Semantic Analysis of Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
Process and representation in graphical displays
NASA Technical Reports Server (NTRS)
Gillan, Douglas J.; Lewis, Robert; Rudisill, Marianne
1990-01-01
How people comprehend graphics is examined. Graphical comprehension involves the cognitive representation of information from a graphic display and the processing strategies that people apply to answer questions about graphics. Research on representation has examined both the features present in a graphic display and the cognitive representation of the graphic. The key features include the physical components of a graph, the relation between the figure and its axes, and the information in the graph. Tests of people's memory for graphs indicate that both the physical and informational aspect of a graph are important in the cognitive representation of a graph. However, the physical (or perceptual) features overshadow the information to a large degree. Processing strategies also involve a perception-information distinction. In order to answer simple questions (e.g., determining the value of a variable, comparing several variables, and determining the mean of a set of variables), people switch between two information processing strategies: (1) an arithmetic, look-up strategy in which they use a graph much like a table, looking up values and performing arithmetic calculations; and (2) a perceptual strategy in which they use the spatial characteristics of the graph to make comparisons and estimations. The user's choice of strategies depends on the task and the characteristics of the graph. A theory of graphic comprehension is presented.
Applied Graph-Mining Algorithms to Study Biomolecular Interaction Networks
2014-01-01
Protein-protein interaction (PPI) networks carry vital information on the organization of molecular interactions in cellular systems. The identification of functionally relevant modules in PPI networks is one of the most important applications of biological network analysis. Computational analysis is becoming an indispensable tool to understand large-scale biomolecular interaction networks. Several types of computational methods have been developed and employed for the analysis of PPI networks. Of these computational methods, graph comparison and module detection are the two most commonly used strategies. This review summarizes current literature on graph kernel and graph alignment methods for graph comparison strategies, as well as module detection approaches including seed-and-extend, hierarchical clustering, optimization-based, probabilistic, and frequent subgraph methods. Herein, we provide a comprehensive review of the major algorithms employed under each theme, including our recently published frequent subgraph method, for detecting functional modules commonly shared across multiple cancer PPI networks. PMID:24800226
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines
NASA Technical Reports Server (NTRS)
1999-01-01
Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
Mining and Modeling Real-World Networks: Patterns, Anomalies, and Tools
ERIC Educational Resources Information Center
Akoglu, Leman
2012-01-01
Large real-world graph (a.k.a network, relational) data are omnipresent, in online media, businesses, science, and the government. Analysis of these massive graphs is crucial, in order to extract descriptive and predictive knowledge with many commercial, medical, and environmental applications. In addition to its general structure, knowing what…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bromberger, Seth A.; Klymko, Christine F.; Henderson, Keith A.
Betweenness centrality is a graph statistic used to nd vertices that are participants in a large number of shortest paths in a graph. This centrality measure is commonly used in path and network interdiction problems and its complete form requires the calculation of all-pairs shortest paths for each vertex. This leads to a time complexity of O(jV jjEj), which is impractical for large graphs. Estimation of betweenness centrality has focused on performing shortest-path calculations on a subset of randomly- selected vertices. This reduces the complexity of the centrality estimation to O(jSjjEj); jSj < jV j, which can be scaled appropriatelymore » based on the computing resources available. An estimation strategy that uses random selection of vertices for seed selection is fast and simple to implement, but may not provide optimal estimation of betweenness centrality when the number of samples is constrained. Our experimentation has identi ed a number of alternate seed-selection strategies that provide lower error than random selection in common scale-free graphs. These strategies are discussed and experimental results are presented.« less
NASA Technical Reports Server (NTRS)
Kuwata, Yoshiaki; Blackmore, Lars; Wolf, Michael; Fathpour, Nanaz; Newman, Claire; Elfes, Alberto
2009-01-01
Hot air (Montgolfiere) balloons represent a promising vehicle system for possible future exploration of planets and moons with thick atmospheres such as Venus and Titan. To go to a desired location, this vehicle can primarily use the horizontal wind that varies with altitude, with a small help of its own actuation. A main challenge is how to plan such trajectory in a highly nonlinear and time-varying wind field. This paper poses this trajectory planning as a graph search on the space-time grid and addresses its computational aspects. When capturing various time scales involved in the wind field over the duration of long exploration mission, the size of the graph becomes excessively large. We show that the adjacency matrix of the graph is block-triangular, and by exploiting this structure, we decompose the large planning problem into several smaller subproblems, whose memory requirement stays almost constant as the problem size grows. The approach is demonstrated on a global reachability analysis of a possible Titan mission scenario.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Multiple directed graph large-class multi-spectral processor
NASA Technical Reports Server (NTRS)
Casasent, David; Liu, Shiaw-Dong; Yoneyama, Hideyuki
1988-01-01
Numerical analysis techniques for the interpretation of high-resolution imaging-spectrometer data are described and demonstrated. The method proposed involves the use of (1) a hierarchical classifier with a tree structure generated automatically by a Fisher linear-discriminant-function algorithm and (2) a novel multiple-directed-graph scheme which reduces the local maxima and the number of perturbations required. Results for a 500-class test problem involving simulated imaging-spectrometer data are presented in tables and graphs; 100-percent-correct classification is achieved with an improvement factor of 5.
Real-time community detection in full social networks on a laptop
Chamberlain, Benjamin Paul; Levy-Kramer, Josh; Humby, Clive
2018-01-01
For a broad range of research and practical applications it is important to understand the allegiances, communities and structure of key players in society. One promising direction towards extracting this information is to exploit the rich relational data in digital social networks (the social graph). As global social networks (e.g., Facebook and Twitter) are very large, most approaches make use of distributed computing systems for this purpose. Distributing graph processing requires solving many difficult engineering problems, which has lead some researchers to look at single-machine solutions that are faster and easier to maintain. In this article, we present an approach for analyzing full social networks on a standard laptop, allowing for interactive exploration of the communities in the locality of a set of user specified query vertices. The key idea is that the aggregate actions of large numbers of users can be compressed into a data structure that encapsulates the edge weights between vertices in a derived graph. Local communities can be constructed by selecting vertices that are connected to the query vertices with high edge weights in the derived graph. This compression is robust to noise and allows for interactive queries of local communities in real-time, which we define to be less than the average human reaction time of 0.25s. We achieve single-machine real-time performance by compressing the neighborhood of each vertex using minhash signatures and facilitate rapid queries through Locality Sensitive Hashing. These techniques reduce query times from hours using industrial desktop machines operating on the full graph to milliseconds on standard laptops. Our method allows exploration of strongly associated regions (i.e., communities) of large graphs in real-time on a laptop. It has been deployed in software that is actively used by social network analysts and offers another channel for media owners to monetize their data, helping them to continue to provide free services that are valued by billions of people globally. PMID:29342158
Approximate Locality for Quantum Systems on Graphs
NASA Astrophysics Data System (ADS)
Osborne, Tobias J.
2008-10-01
In this Letter we make progress on a long-standing open problem of Aaronson and Ambainis [Theory Comput. 1, 47 (2005)1557-2862]: we show that if U is a sparse unitary operator with a gap Δ in its spectrum, then there exists an approximate logarithm H of U which is also sparse. The sparsity pattern of H gets more dense as 1/Δ increases. This result can be interpreted as a way to convert between local continuous-time and local discrete-time quantum processes. As an example we show that the discrete-time coined quantum walk can be realized stroboscopically from an approximately local continuous-time quantum walk.
GOGrapher: A Python library for GO graph representation and analysis
Muller, Brian; Richards, Adam J; Jin, Bo; Lu, Xinghua
2009-01-01
Background The Gene Ontology is the most commonly used controlled vocabulary for annotating proteins. The concepts in the ontology are organized as a directed acyclic graph, in which a node corresponds to a biological concept and a directed edge denotes the parent-child semantic relationship between a pair of terms. A large number of protein annotations further create links between proteins and their functional annotations, reflecting the contemporary knowledge about proteins and their functional relationships. This leads to a complex graph consisting of interleaved biological concepts and their associated proteins. What is needed is a simple, open source library that provides tools to not only create and view the Gene Ontology graph, but to analyze and manipulate it as well. Here we describe the development and use of GOGrapher, a Python library that can be used for the creation, analysis, manipulation, and visualization of Gene Ontology related graphs. Findings An object-oriented approach was adopted to organize the hierarchy of the graphs types and associated classes. An Application Programming Interface is provided through which different types of graphs can be pragmatically created, manipulated, and visualized. GOGrapher has been successfully utilized in multiple research projects, e.g., a graph-based multi-label text classifier for protein annotation. Conclusion The GOGrapher project provides a reusable programming library designed for the manipulation and analysis of Gene Ontology graphs. The library is freely available for the scientific community to use and improve. PMID:19583843
Information Graph Flow: A Geometric Approximation of Quantum and Statistical Systems
NASA Astrophysics Data System (ADS)
Vanchurin, Vitaly
2018-05-01
Given a quantum (or statistical) system with a very large number of degrees of freedom and a preferred tensor product factorization of the Hilbert space (or of a space of distributions) we describe how it can be approximated with a very low-dimensional field theory with geometric degrees of freedom. The geometric approximation procedure consists of three steps. The first step is to construct weighted graphs (we call information graphs) with vertices representing subsystems (e.g., qubits or random variables) and edges representing mutual information (or the flow of information) between subsystems. The second step is to deform the adjacency matrices of the information graphs to that of a (locally) low-dimensional lattice using the graph flow equations introduced in the paper. (Note that the graph flow produces very sparse adjacency matrices and thus might also be used, for example, in machine learning or network science where the task of graph sparsification is of a central importance.) The third step is to define an emergent metric and to derive an effective description of the metric and possibly other degrees of freedom. To illustrate the procedure we analyze (numerically and analytically) two information graph flows with geometric attractors (towards locally one- and two-dimensional lattices) and metric perturbations obeying a geometric flow equation. Our analysis also suggests a possible approach to (a non-perturbative) quantum gravity in which the geometry (a secondary object) emerges directly from a quantum state (a primary object) due to the flow of the information graphs.
Information Graph Flow: A Geometric Approximation of Quantum and Statistical Systems
NASA Astrophysics Data System (ADS)
Vanchurin, Vitaly
2018-06-01
Given a quantum (or statistical) system with a very large number of degrees of freedom and a preferred tensor product factorization of the Hilbert space (or of a space of distributions) we describe how it can be approximated with a very low-dimensional field theory with geometric degrees of freedom. The geometric approximation procedure consists of three steps. The first step is to construct weighted graphs (we call information graphs) with vertices representing subsystems (e.g., qubits or random variables) and edges representing mutual information (or the flow of information) between subsystems. The second step is to deform the adjacency matrices of the information graphs to that of a (locally) low-dimensional lattice using the graph flow equations introduced in the paper. (Note that the graph flow produces very sparse adjacency matrices and thus might also be used, for example, in machine learning or network science where the task of graph sparsification is of a central importance.) The third step is to define an emergent metric and to derive an effective description of the metric and possibly other degrees of freedom. To illustrate the procedure we analyze (numerically and analytically) two information graph flows with geometric attractors (towards locally one- and two-dimensional lattices) and metric perturbations obeying a geometric flow equation. Our analysis also suggests a possible approach to (a non-perturbative) quantum gravity in which the geometry (a secondary object) emerges directly from a quantum state (a primary object) due to the flow of the information graphs.
Improving Student Knowledge of the Graphing Calculator's Capabilities.
ERIC Educational Resources Information Center
Hubbard, Donna
This paper describes an intervention in two Algebra II classes in which the graphing calculator was incorporated into the curriculum as often as possible. The targeted population consisted of high school students in a growing middle to upper class community located in a suburb of a large city. The problem of a lack of understanding of the…
Scale-space measures for graph topology link protein network architecture to function.
Hulsman, Marc; Dimitrakopoulos, Christos; de Ridder, Jeroen
2014-06-15
The network architecture of physical protein interactions is an important determinant for the molecular functions that are carried out within each cell. To study this relation, the network architecture can be characterized by graph topological characteristics such as shortest paths and network hubs. These characteristics have an important shortcoming: they do not take into account that interactions occur across different scales. This is important because some cellular functions may involve a single direct protein interaction (small scale), whereas others require more and/or indirect interactions, such as protein complexes (medium scale) and interactions between large modules of proteins (large scale). In this work, we derive generalized scale-aware versions of known graph topological measures based on diffusion kernels. We apply these to characterize the topology of networks across all scales simultaneously, generating a so-called graph topological scale-space. The comprehensive physical interaction network in yeast is used to show that scale-space based measures consistently give superior performance when distinguishing protein functional categories and three major types of functional interactions-genetic interaction, co-expression and perturbation interactions. Moreover, we demonstrate that graph topological scale spaces capture biologically meaningful features that provide new insights into the link between function and protein network architecture. Matlab(TM) code to calculate the scale-aware topological measures (STMs) is available at http://bioinformatics.tudelft.nl/TSSA © The Author 2014. Published by Oxford University Press.
Hosseini, S M Hadi; Hoeft, Fumiko; Kesler, Shelli R
2012-01-01
In recent years, graph theoretical analyses of neuroimaging data have increased our understanding of the organization of large-scale structural and functional brain networks. However, tools for pipeline application of graph theory for analyzing topology of brain networks is still lacking. In this report, we describe the development of a graph-analysis toolbox (GAT) that facilitates analysis and comparison of structural and functional network brain networks. GAT provides a graphical user interface (GUI) that facilitates construction and analysis of brain networks, comparison of regional and global topological properties between networks, analysis of network hub and modules, and analysis of resilience of the networks to random failure and targeted attacks. Area under a curve (AUC) and functional data analyses (FDA), in conjunction with permutation testing, is employed for testing the differences in network topologies; analyses that are less sensitive to the thresholding process. We demonstrated the capabilities of GAT by investigating the differences in the organization of regional gray-matter correlation networks in survivors of acute lymphoblastic leukemia (ALL) and healthy matched Controls (CON). The results revealed an alteration in small-world characteristics of the brain networks in the ALL survivors; an observation that confirm our hypothesis suggesting widespread neurobiological injury in ALL survivors. Along with demonstration of the capabilities of the GAT, this is the first report of altered large-scale structural brain networks in ALL survivors.
Detecting communities using asymptotical surprise
NASA Astrophysics Data System (ADS)
Traag, V. A.; Aldecoa, R.; Delvenne, J.-C.
2015-08-01
Nodes in real-world networks are repeatedly observed to form dense clusters, often referred to as communities. Methods to detect these groups of nodes usually maximize an objective function, which implicitly contains the definition of a community. We here analyze a recently proposed measure called surprise, which assesses the quality of the partition of a network into communities. In its current form, the formulation of surprise is rather difficult to analyze. We here therefore develop an accurate asymptotic approximation. This allows for the development of an efficient algorithm for optimizing surprise. Incidentally, this leads to a straightforward extension of surprise to weighted graphs. Additionally, the approximation makes it possible to analyze surprise more closely and compare it to other methods, especially modularity. We show that surprise is (nearly) unaffected by the well-known resolution limit, a particular problem for modularity. However, surprise may tend to overestimate the number of communities, whereas they may be underestimated by modularity. In short, surprise works well in the limit of many small communities, whereas modularity works better in the limit of few large communities. In this sense, surprise is more discriminative than modularity and may find communities where modularity fails to discern any structure.
NASA Astrophysics Data System (ADS)
McIntire, John P.; Osesina, O. Isaac; Bartley, Cecilia; Tudoreanu, M. Eduard; Havig, Paul R.; Geiselman, Eric E.
2012-06-01
Ensuring the proper and effective ways to visualize network data is important for many areas of academia, applied sciences, the military, and the public. Fields such as social network analysis, genetics, biochemistry, intelligence, cybersecurity, neural network modeling, transit systems, communications, etc. often deal with large, complex network datasets that can be difficult to interact with, study, and use. There have been surprisingly few human factors performance studies on the relative effectiveness of different graph drawings or network diagram techniques to convey information to a viewer. This is particularly true for weighted networks which include the strength of connections between nodes, not just information about which nodes are linked to other nodes. We describe a human factors study in which participants performed four separate network analysis tasks (finding a direct link between given nodes, finding an interconnected node between given nodes, estimating link strengths, and estimating the most densely interconnected nodes) on two different network visualizations: an adjacency matrix with a heat-map versus a node-link diagram. The results should help shed light on effective methods of visualizing network data for some representative analysis tasks, with the ultimate goal of improving usability and performance for viewers of network data displays.
Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad
2016-02-01
Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less
Simulator for heterogeneous dataflow architectures
NASA Technical Reports Server (NTRS)
Malekpour, Mahyar R.
1993-01-01
A new simulator is developed to simulate the execution of an algorithm graph in accordance with the Algorithm to Architecture Mapping Model (ATAMM) rules. ATAMM is a Petri Net model which describes the periodic execution of large-grained, data-independent dataflow graphs and which provides predictable steady state time-optimized performance. This simulator extends the ATAMM simulation capability from a heterogenous set of resources, or functional units, to a more general heterogenous architecture. Simulation test cases show that the simulator accurately executes the ATAMM rules for both a heterogenous architecture and a homogenous architecture, which is the special case for only one processor type. The simulator forms one tool in an ATAMM Integrated Environment which contains other tools for graph entry, graph modification for performance optimization, and playback of simulations for analysis.
Truncated Long-Range Percolation on Oriented Graphs
NASA Astrophysics Data System (ADS)
van Enter, A. C. D.; de Lima, B. N. B.; Valesin, D.
2016-07-01
We consider different problems within the general theme of long-range percolation on oriented graphs. Our aim is to settle the so-called truncation question, described as follows. We are given probabilities that certain long-range oriented bonds are open; assuming that the sum of these probabilities is infinite, we ask if the probability of percolation is positive when we truncate the graph, disallowing bonds of range above a possibly large but finite threshold. We give some conditions in which the answer is affirmative. We also translate some of our results on oriented percolation to the context of a long-range contact process.
Graph-based real-time fault diagnostics
NASA Technical Reports Server (NTRS)
Padalkar, S.; Karsai, G.; Sztipanovits, J.
1988-01-01
A real-time fault detection and diagnosis capability is absolutely crucial in the design of large-scale space systems. Some of the existing AI-based fault diagnostic techniques like expert systems and qualitative modelling are frequently ill-suited for this purpose. Expert systems are often inadequately structured, difficult to validate and suffer from knowledge acquisition bottlenecks. Qualitative modelling techniques sometimes generate a large number of failure source alternatives, thus hampering speedy diagnosis. In this paper we present a graph-based technique which is well suited for real-time fault diagnosis, structured knowledge representation and acquisition and testing and validation. A Hierarchical Fault Model of the system to be diagnosed is developed. At each level of hierarchy, there exist fault propagation digraphs denoting causal relations between failure modes of subsystems. The edges of such a digraph are weighted with fault propagation time intervals. Efficient and restartable graph algorithms are used for on-line speedy identification of failure source components.
Experimental study of thin film sensor networks for wind turbine blade damage detection
NASA Astrophysics Data System (ADS)
Downey, A.; Laflamme, S.; Ubertini, F.; Sauder, H.; Sarkar, P.
2017-02-01
Damage detection of wind turbine blades is difficult due to their complex geometry and large size, for which large deployment of sensing systems is typically not economical. A solution is to develop and deploy dedicated sensor networks fabricated from inexpensive materials and electronics. The authors have recently developed a novel skin-type strain gauge for measuring strain over very large surfaces. The skin, a type of large-area electronics, is constituted from a network of soft elastomeric capacitors. The sensing system is analogous to a biological skin, where local strain can be monitored over a global area. In this paper, we propose the utilization of a dense network of soft elastomeric capacitors to detect, localize, and quantify damage on wind turbine blades. We also leverage mature off-the-shelf technologies, in particular resistive strain gauges, to augment such dense sensor network with high accuracy data at key locations, therefore constituting a hybrid dense sensor network. The proposed hybrid dense sensor network is installed inside a wind turbine blade model, and tested in a wind tunnel to simulate an operational environment. Results demonstrate the ability of the hybrid dense sensor network to detect, localize, and quantify damage.
eHUGS: Enhanced Hierarchical Unbiased Graph Shrinkage for Efficient Groupwise Registration
Wu, Guorong; Peng, Xuewei; Ying, Shihui; Wang, Qian; Yap, Pew-Thian; Shen, Dan; Shen, Dinggang
2016-01-01
Effective and efficient spatial normalization of a large population of brain images is critical for many clinical and research studies, but it is technically very challenging. A commonly used approach is to choose a certain image as the template and then align all other images in the population to this template by applying pairwise registration. To avoid the potential bias induced by the inappropriate template selection, groupwise registration methods have been proposed to simultaneously register all images to a latent common space. However, current groupwise registration methods do not make full use of image distribution information for more accurate registration. In this paper, we present a novel groupwise registration method that harnesses the image distribution information by capturing the image distribution manifold using a hierarchical graph with its nodes representing the individual images. More specifically, a low-level graph describes the image distribution in each subgroup, and a high-level graph encodes the relationship between representative images of subgroups. Given the graph representation, we can register all images to the common space by dynamically shrinking the graph on the image manifold. The topology of the entire image distribution is always maintained during graph shrinkage. Evaluations on two datasets, one for 80 elderly individuals and one for 285 infants, indicate that our method can yield promising results. PMID:26800361
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Yubin; Shankar, Mallikarjun; Park, Byung H.
Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcaremore » RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed 3NF Equivalent Graph (3EG) transformation.We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.« less
An Analytical Framework for Fast Estimation of Capacity and Performance in Communication Networks
2012-01-25
standard random graph (due to Erdos- Renyi ) in the regime where the average degrees remain fixed (and above 1) and the number of nodes get large, is not...abs/1010.3305 (Oct 2010). [6] O. Narayan, I. Saniee, G. H. Tucci, “Lack of Spectral Gap and Hyperbolicity in Asymptotic Erdös- Renyi Random Graphs
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
A Direct Mapping of Max k-SAT and High Order Parity Checks to a Chimera Graph
Chancellor, N.; Zohren, S.; Warburton, P. A.; Benjamin, S. C.; Roberts, S.
2016-01-01
We demonstrate a direct mapping of max k-SAT problems (and weighted max k-SAT) to a Chimera graph, which is the non-planar hardware graph of the devices built by D-Wave Systems Inc. We further show that this mapping can be used to map a similar class of maximum satisfiability problems where the clauses are replaced by parity checks over potentially large numbers of bits. The latter is of specific interest for applications in decoding for communication. We discuss an example in which the decoding of a turbo code, which has been demonstrated to perform near the Shannon limit, can be mapped to a Chimera graph. The weighted max k-SAT problem is the most general class of satisfiability problems, so our result effectively demonstrates how any satisfiability problem may be directly mapped to a Chimera graph. Our methods faithfully reproduce the low energy spectrum of the target problems, so therefore may also be used for maximum entropy inference. PMID:27857179
Graph-Based Semantic Web Service Composition for Healthcare Data Integration.
Arch-Int, Ngamnij; Arch-Int, Somjit; Sonsilphong, Suphachoke; Wanchai, Paweena
2017-01-01
Within the numerous and heterogeneous web services offered through different sources, automatic web services composition is the most convenient method for building complex business processes that permit invocation of multiple existing atomic services. The current solutions in functional web services composition lack autonomous queries of semantic matches within the parameters of web services, which are necessary in the composition of large-scale related services. In this paper, we propose a graph-based Semantic Web Services composition system consisting of two subsystems: management time and run time. The management-time subsystem is responsible for dependency graph preparation in which a dependency graph of related services is generated automatically according to the proposed semantic matchmaking rules. The run-time subsystem is responsible for discovering the potential web services and nonredundant web services composition of a user's query using a graph-based searching algorithm. The proposed approach was applied to healthcare data integration in different health organizations and was evaluated according to two aspects: execution time measurement and correctness measurement.
Graph-Based Semantic Web Service Composition for Healthcare Data Integration
2017-01-01
Within the numerous and heterogeneous web services offered through different sources, automatic web services composition is the most convenient method for building complex business processes that permit invocation of multiple existing atomic services. The current solutions in functional web services composition lack autonomous queries of semantic matches within the parameters of web services, which are necessary in the composition of large-scale related services. In this paper, we propose a graph-based Semantic Web Services composition system consisting of two subsystems: management time and run time. The management-time subsystem is responsible for dependency graph preparation in which a dependency graph of related services is generated automatically according to the proposed semantic matchmaking rules. The run-time subsystem is responsible for discovering the potential web services and nonredundant web services composition of a user's query using a graph-based searching algorithm. The proposed approach was applied to healthcare data integration in different health organizations and was evaluated according to two aspects: execution time measurement and correctness measurement. PMID:29065602
Temporal dynamics and impact of event interactions in cyber-social populations
NASA Astrophysics Data System (ADS)
Zhang, Yi-Qing; Li, Xiang
2013-03-01
The advance of information technologies provides powerful measures to digitize social interactions and facilitate quantitative investigations. To explore large-scale indoor interactions of a social population, we analyze 18 715 users' Wi-Fi access logs recorded in a Chinese university campus during 3 months, and define event interaction (EI) to characterize the concurrent interactions of multiple users inferred by their geographic coincidences—co-locating in the same small region at the same time. We propose three rules to construct a transmission graph, which depicts the topological and temporal features of event interactions. The vertex dynamics of transmission graph tells that the active durations of EIs fall into the truncated power-law distributions, which is independent on the number of involved individuals. The edge dynamics of transmission graph reports that the transmission durations present a truncated power-law pattern independent on the daily and weekly periodicities. Besides, in the aggregated transmission graph, low-degree vertices previously neglected in the aggregated static networks may participate in the large-degree EIs, which is verified by three data sets covering different sizes of social populations with various rendezvouses. This work highlights the temporal significance of event interactions in cyber-social populations.
Asada, Yukiko; Abel, Hannah; Skedgel, Chris; Warner, Grace
2017-12-01
Policy Points: Effective graphs can be a powerful tool in communicating health inequality. The choice of graphs is often based on preferences and familiarity rather than science. According to the literature on graph perception, effective graphs allow human brains to decode visual cues easily. Dot charts are easier to decode than bar charts, and thus they are more effective. Dot charts are a flexible and versatile way to display information about health inequality. Consistent with the health risk communication literature, the captions accompanying health inequality graphs should provide a numerical, explicitly calculated description of health inequality, expressed in absolute and relative terms, from carefully thought-out perspectives. Graphs are an essential tool for communicating health inequality, a key health policy concern. The choice of graphs is often driven by personal preferences and familiarity. Our article is aimed at health policy researchers developing health inequality graphs for policy and scientific audiences and seeks to (1) raise awareness of the effective use of graphs in communicating health inequality; (2) advocate for a particular type of graph (ie, dot charts) to depict health inequality; and (3) suggest key considerations for the captions accompanying health inequality graphs. Using composite review methods, we selected the prevailing recommendations for improving graphs in scientific reporting. To find the origins of these recommendations, we reviewed the literature on graph perception and then applied what we learned to the context of health inequality. In addition, drawing from the numeracy literature in health risk communication, we examined numeric and verbal formats to explain health inequality graphs. Many disciplines offer commonsense recommendations for visually presenting quantitative data. The literature on graph perception, which defines effective graphs as those allowing the easy decoding of visual cues in human brains, shows that with their more accurate and easier-to-decode visual cues, dot charts are more effective than bar charts. Dot charts can flexibly present a large amount of information in limited space. They also can easily accommodate typical health inequality information to describe a health variable (eg, life expectancy) by an inequality domain (eg, income) with domain groups (eg, poor and rich) in a population (eg, Canada) over time periods (eg, 2010 and 2017). The numeracy literature suggests that a health inequality graph's caption should provide a numerical, explicitly calculated description of health inequality expressed in absolute and relative terms, from carefully thought-out perspectives. Given the ubiquity of graphs, the health inequality field should learn from the vibrant multidisciplinary literature how to construct effective graphic communications, especially by considering to use dot charts. © 2017 Milbank Memorial Fund.
Graph Design via Convex Optimization: Online and Distributed Perspectives
NASA Astrophysics Data System (ADS)
Meng, De
Network and graph have long been natural abstraction of relations in a variety of applications, e.g. transportation, power system, social network, communication, electrical circuit, etc. As a large number of computation and optimization problems are naturally defined on graphs, graph structures not only enable important properties of these problems, but also leads to highly efficient distributed and online algorithms. For example, graph separability enables the parallelism for computation and operation as well as limits the size of local problems. More interestingly, graphs can be defined and constructed in order to take best advantage of those problem properties. This dissertation focuses on graph structure and design in newly proposed optimization problems, which establish a bridge between graph properties and optimization problem properties. We first study a new optimization problem called Geodesic Distance Maximization Problem (GDMP). Given a graph with fixed edge weights, finding the shortest path, also known as the geodesic, between two nodes is a well-studied network flow problem. We introduce the Geodesic Distance Maximization Problem (GDMP): the problem of finding the edge weights that maximize the length of the geodesic subject to convex constraints on the weights. We show that GDMP is a convex optimization problem for a wide class of flow costs, and provide a physical interpretation using the dual. We present applications of the GDMP in various fields, including optical lens design, network interdiction, and resource allocation in the control of forest fires. We develop an Alternating Direction Method of Multipliers (ADMM) by exploiting specific problem structures to solve large-scale GDMP, and demonstrate its effectiveness in numerical examples. We then turn our attention to distributed optimization on graph with only local communication. Distributed optimization arises in a variety of applications, e.g. distributed tracking and localization, estimation problems in sensor networks, multi-agent coordination. Distributed optimization aims to optimize a global objective function formed by summation of coupled local functions over a graph via only local communication and computation. We developed a weighted proximal ADMM for distributed optimization using graph structure. This fully distributed, single-loop algorithm allows simultaneous updates and can be viewed as a generalization of existing algorithms. More importantly, we achieve faster convergence by jointly designing graph weights and algorithm parameters. Finally, we propose a new problem on networks called Online Network Formation Problem: starting with a base graph and a set of candidate edges, at each round of the game, player one first chooses a candidate edge and reveals it to player two, then player two decides whether to accept it; player two can only accept limited number of edges and make online decisions with the goal to achieve the best properties of the synthesized network. The network properties considered include the number of spanning trees, algebraic connectivity and total effective resistance. These network formation games arise in a variety of cooperative multiagent systems. We propose a primal-dual algorithm framework for the general online network formation game, and analyze the algorithm performance by the competitive ratio and regret.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Winlaw, Manda; De Sterck, Hans; Sanders, Geoffrey
In very simple terms a network can be de ned as a collection of points joined together by lines. Thus, networks can be used to represent connections between entities in a wide variety of elds including engi- neering, science, medicine, and sociology. Many large real-world networks share a surprising number of properties, leading to a strong interest in model development research and techniques for building synthetic networks have been developed, that capture these similarities and replicate real-world graphs. Modeling these real-world networks serves two purposes. First, building models that mimic the patterns and prop- erties of real networks helps tomore » understand the implications of these patterns and helps determine which patterns are important. If we develop a generative process to synthesize real networks we can also examine which growth processes are plausible and which are not. Secondly, high-quality, large-scale network data is often not available, because of economic, legal, technological, or other obstacles [7]. Thus, there are many instances where the systems of interest cannot be represented by a single exemplar network. As one example, consider the eld of cybersecurity, where systems require testing across diverse threat scenarios and validation across diverse network structures. In these cases, where there is no single exemplar network, the systems must instead be modeled as a collection of networks in which the variation among them may be just as important as their common features. By developing processes to build synthetic models, so-called graph generators, we can build synthetic networks that capture both the essential features of a system and realistic variability. Then we can use such synthetic graphs to perform tasks such as simulations, analysis, and decision making. We can also use synthetic graphs to performance test graph analysis algorithms, including clustering algorithms and anomaly detection algorithms.« less
Using Behavior Over Time Graphs to Spur Systems Thinking Among Public Health Practitioners.
Calancie, Larissa; Anderson, Seri; Branscomb, Jane; Apostolico, Alexsandra A; Lich, Kristen Hassmiller
2018-02-01
Public health practitioners can use Behavior Over Time (BOT) graphs to spur discussion and systems thinking around complex challenges. Multiple large systems, such as health care, the economy, and education, affect chronic disease rates in the United States. System thinking tools can build public health practitioners' capacity to understand these systems and collaborate within and across sectors to improve population health. BOT graphs show a variable, or variables (y axis) over time (x axis). Although analyzing trends is not new to public health, drawing BOT graphs, annotating the events and systemic forces that are likely to influence the depicted trends, and then discussing the graphs in a diverse group provides an opportunity for public health practitioners to hear each other's perspectives and creates a more holistic understanding of the key factors that contribute to a trend. We describe how BOT graphs are used in public health, how they can be used to generate group discussion, and how this process can advance systems-level thinking. Then we describe how BOT graphs were used with groups of maternal and child health (MCH) practitioners and partners (N = 101) during a training session to advance their thinking about MCH challenges. Eighty-six percent of the 84 participants who completed an evaluation agreed or strongly agreed that they would use this BOT graph process to engage stakeholders in their home states and jurisdictions. The BOT graph process we describe can be applied to a variety of public health issues and used by practitioners, stakeholders, and researchers.
Graph theory approach to the eigenvalue problem of large space structures
NASA Technical Reports Server (NTRS)
Reddy, A. S. S. R.; Bainum, P. M.
1981-01-01
Graph theory is used to obtain numerical solutions to eigenvalue problems of large space structures (LSS) characterized by a state vector of large dimensions. The LSS are considered as large, flexible systems requiring both orientation and surface shape control. Graphic interpretation of the determinant of a matrix is employed to reduce a higher dimensional matrix into combinations of smaller dimensional sub-matrices. The reduction is implemented by means of a Boolean equivalent of the original matrices formulated to obtain smaller dimensional equivalents of the original numerical matrix. Computation time becomes less and more accurate solutions are possible. An example is provided in the form of a free-free square plate. Linearized system equations and numerical values of a stiffness matrix are presented, featuring a state vector with 16 components.
Arbitrary electron acoustic waves in degenerate dense plasmas
NASA Astrophysics Data System (ADS)
Rahman, Ata-ur; Mushtaq, A.; Qamar, A.; Neelam, S.
2017-05-01
A theoretical investigation is carried out of the nonlinear dynamics of electron-acoustic waves in a collisionless and unmagnetized plasma whose constituents are non-degenerate cold electrons, ultra-relativistic degenerate electrons, and stationary ions. A dispersion relation is derived for linear EAWs. An energy integral equation involving the Sagdeev potential is derived, and basic properties of the large amplitude solitary structures are investigated in such a degenerate dense plasma. It is shown that only negative large amplitude EA solitary waves can exist in such a plasma system. The present analysis may be important to understand the collective interactions in degenerate dense plasmas, occurring in dense astrophysical environments as well as in laser-solid density plasma interaction experiments.
Overview and extensions of a system for routing directed graphs on SIMD architectures
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl
1988-01-01
Many problems can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from adjacent vertices. A method is given for parallelizing such problems on an SIMD machine model that uses only nearest neighbor connections for communication, and has no facility for local indirect addressing. Each vertex of the graph will be assigned to a processor in the machine. Rules for a labeling are introduced that support the use of a simple algorithm for movement of data along the edges of the graph. Additional algorithms are defined for addition and deletion of edges. Modifying or adding a new edge takes the same time as parallel traversal. This combination of architecture and algorithms defines a system that is relatively simple to build and can do fast graph processing. All edges can be traversed in parallel in time O(T), where T is empirically proportional to the average path length in the embedding times the average degree of the graph. Additionally, researchers present an extension to the above method which allows for enhanced performance by allowing some broadcasting capabilities.
Wedge sampling for computing clustering coefficients and triangle counts on large graphs
Seshadhri, C.; Pinar, Ali; Kolda, Tamara G.
2014-05-08
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Despite the importance of these triadic measures, algorithms to compute them can be extremely expensive. We discuss the method of wedge sampling. This versatile technique allows for the fast and accurate approximation of various types of clustering coefficients and triangle counts. Furthermore, these techniques are extensible to counting directed triangles in digraphs. Our methods come with provable andmore » practical time-approximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the state of the art, while providing nearly the accuracy of full enumeration.« less
Distributed MPC based consensus for single-integrator multi-agent systems.
Cheng, Zhaomeng; Fan, Ming-Can; Zhang, Hai-Tao
2015-09-01
This paper addresses model predictive control schemes for consensus in multi-agent systems (MASs) with discrete-time single-integrator dynamics under switching directed interaction graphs. The control horizon is extended to be greater than one which endows the closed-loop system with extra degree of freedom. We derive sufficient conditions on the sampling period and the interaction graph to achieve consensus by using the property of infinite products of stochastic matrices. Consensus can be achieved asymptotically if the sampling period is selected such that the interaction graph among agents has a directed spanning tree jointly. Significantly, if the interaction graph always has a spanning tree, one can select an arbitrary large sampling period to guarantee consensus. Finally, several simulations are conducted to illustrate the effectiveness of the theoretical results. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Reconstruction and simplification of urban scene models based on oblique images
NASA Astrophysics Data System (ADS)
Liu, J.; Guo, B.
2014-08-01
We describe a multi-view stereo reconstruction and simplification algorithms for urban scene models based on oblique images. The complexity, diversity, and density within the urban scene, it increases the difficulty to build the city models using the oblique images. But there are a lot of flat surfaces existing in the urban scene. One of our key contributions is that a dense matching algorithm based on Self-Adaptive Patch in view of the urban scene is proposed. The basic idea of matching propagating based on Self-Adaptive Patch is to build patches centred by seed points which are already matched. The extent and shape of the patches can adapt to the objects of urban scene automatically: when the surface is flat, the extent of the patch would become bigger; while the surface is very rough, the extent of the patch would become smaller. The other contribution is that the mesh generated by Graph Cuts is 2-manifold surface satisfied the half edge data structure. It is solved by clustering and re-marking tetrahedrons in s-t graph. The purpose of getting 2- manifold surface is to simply the mesh by edge collapse algorithm which can preserve and stand out the features of buildings.
Dimitriadis, Stavros I.; Zouridakis, George; Rezaie, Roozbeh; Babajani-Feremi, Abbas; Papanicolaou, Andrew C.
2015-01-01
Mild traumatic brain injury (mTBI) may affect normal cognition and behavior by disrupting the functional connectivity networks that mediate efficient communication among brain regions. In this study, we analyzed brain connectivity profiles from resting state Magnetoencephalographic (MEG) recordings obtained from 31 mTBI patients and 55 normal controls. We used phase-locking value estimates to compute functional connectivity graphs to quantify frequency-specific couplings between sensors at various frequency bands. Overall, normal controls showed a dense network of strong local connections and a limited number of long-range connections that accounted for approximately 20% of all connections, whereas mTBI patients showed networks characterized by weak local connections and strong long-range connections that accounted for more than 60% of all connections. Comparison of the two distinct general patterns at different frequencies using a tensor representation for the connectivity graphs and tensor subspace analysis for optimal feature extraction showed that mTBI patients could be separated from normal controls with 100% classification accuracy in the alpha band. These encouraging findings support the hypothesis that MEG-based functional connectivity patterns may be used as biomarkers that can provide more accurate diagnoses, help guide treatment, and monitor effectiveness of intervention in mTBI. PMID:26640764
Student reasoning about graphs in different contexts
NASA Astrophysics Data System (ADS)
Ivanjek, Lana; Susac, Ana; Planinic, Maja; Andrasevic, Aneta; Milin-Sipus, Zeljka
2016-06-01
This study investigates university students' graph interpretation strategies and difficulties in mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel (isomorphic) mathematics, physics, and other context questions about graphs, which were developed by us, were administered to 385 first-year students at the Faculty of Science, University of Zagreb. Students were asked to provide explanations and/or mathematical procedures with their answers. Students' main strategies and difficulties identified through the analysis of those explanations and procedures are described. Student strategies of graph interpretation were found to be largely context dependent and domain specific. A small fraction of students have used the same strategy in all three domains (mathematics, physics, and other contexts) on most sets of parallel questions. Some students have shown indications of transfer of knowledge in the sense that they used techniques and strategies developed in physics for solving (or attempting to solve) other context problems. In physics, the preferred strategy was the use of formulas, which sometimes seemed to block the use of other, more productive strategies which students displayed in other domains. Students' answers indicated the presence of slope-height confusion and interval-point confusion in all three domains. Students generally better interpreted graph slope than the area under a graph, although the concept of slope still seemed to be quite vague for many. The interpretation of the concept of area under a graph needs more attention in both physics and mathematics teaching.
Kairisto, V; Poola, A
1995-01-01
GraphROC for Windows is a program for clinical test evaluation. It was designed for the handling of large datasets obtained from clinical laboratory databases. In the user interface, graphical and numerical presentations are combined. For simplicity, numerical data is not shown unless requested. Relevant numbers can be "picked up" from the graph by simple mouse operations. Reference distributions can be displayed by using automatically optimized bin widths. Any percentile of the distribution with corresponding confidence limits can be chosen for display. In sensitivity-specificity analysis, both illness- and health-related distributions are shown in the same graph. The following data for any cutoff limit can be shown in a separate click window: clinical sensitivity and specificity with corresponding confidence limits, positive and negative likelihood ratios, positive and negative predictive values and efficiency. Predictive values and clinical efficiency of the cutoff limit can be updated for any prior probability of disease. Receiver Operating Characteristics (ROC) curves can be generated and combined into the same graph for comparison of several different tests. The area under the curve with corresponding confidence interval is calculated for each ROC curve. Numerical results of analyses and graphs can be printed or exported to other Microsoft Windows programs. GraphROC for Windows also employs a new method, developed by us, for the indirect estimation of health-related limits and change limits from mixed distributions of clinical laboratory data.
Comparison of university students' understanding of graphs in different contexts
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana; Milin-Sipus, Zeljka
2013-12-01
This study investigates university students’ understanding of graphs in three different domains: mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel mathematics, physics, and other context questions about graphs were developed. A test consisting of these eight sets of questions (24 questions in all) was administered to 385 first year students at University of Zagreb who were either prospective physics or mathematics teachers or prospective physicists or mathematicians. Rasch analysis of data was conducted and linear measures for item difficulties were obtained. Average difficulties of items in three domains (mathematics, physics, and other contexts) and over two concepts (graph slope, area under the graph) were computed and compared. Analysis suggests that the variation of average difficulty among the three domains is much smaller for the concept of graph slope than for the concept of area under the graph. Most of the slope items are very close in difficulty, suggesting that students who have developed sufficient understanding of graph slope in mathematics are generally able to transfer it almost equally successfully to other contexts. A large difference was found between the difficulty of the concept of area under the graph in physics and other contexts on one side and mathematics on the other side. Comparison of average difficulty of the three domains suggests that mathematics without context is the easiest domain for students. Adding either physics or other context to mathematical items generally seems to increase item difficulty. No significant difference was found between the average item difficulty in physics and contexts other than physics, suggesting that physics (kinematics) remains a difficult context for most students despite the received instruction on kinematics in high school.
Kennedy, David M; Chatha, Kamaljit; Rayner, Hugh C
2013-09-01
Some patients with chronic kidney disease are still referred late for specialist care despite the evidence that earlier detection and intervention can halt or delay progression to end-stage kidney disease (ESKD). To develop a population surveillance system using existing laboratory data to enable early detection of patients at high risk of ESKD by reviewing cumulative graphs of estimated glomerular filtration rate (eGFR). A database was developed, updated daily with data from the laboratory computer. Cumulative eGFR graphs containing up to five years of data are reviewed by clinical scientists for all primary care patients or out-patients with a low eGFR for their age. For those with a declining trend, a report containing the eGFR graph is sent to the requesting doctor. A retrospective audit was performed using historical data to assess the predictive value of the graphs. In nine months, we reported 370,000 eGFR results, reviewing 12,000 eGFR graphs. On average 60 graphs per week were flagged as 'high' or 'intermediate' risk. Patients with graphs flagged as high risk had a significantly higher mortality after 3.5 years and a significantly greater chance of requiring renal replacement therapy after 4.5 years of follow-up. Five patients (7%) with graphs flagged as high risk had a sustained >25% fall in eGFR without evidence of secondary care referral. Feedback about the service from requesting clinicians was 73% positive. We have developed a system for laboratory staff to review cumulative eGFR graphs for a large population and identify patients at highest risk of developing ESKD. Further research is needed to measure the impact of this service on patient outcomes. © 2013 European Dialysis and Transplant Nurses Association/European Renal Care Association.
Danaci, Hasan Fehmi; Cetin-Atalay, Rengul; Atalay, Volkan
2018-03-26
Visualizing large-scale data produced by the high throughput experiments as a biological graph leads to better understanding and analysis. This study describes a customized force-directed layout algorithm, EClerize, for biological graphs that represent pathways in which the nodes are associated with Enzyme Commission (EC) attributes. The nodes with the same EC class numbers are treated as members of the same cluster. Positions of nodes are then determined based on both the biological similarity and the connection structure. EClerize minimizes the intra-cluster distance, that is the distance between the nodes of the same EC cluster and maximizes the inter-cluster distance, that is the distance between two distinct EC clusters. EClerize is tested on a number of biological pathways and the improvement brought in is presented with respect to the original algorithm. EClerize is available as a plug-in to cytoscape ( http://apps.cytoscape.org/apps/eclerize ).
Fingerprint recognition system by use of graph matching
NASA Astrophysics Data System (ADS)
Shen, Wei; Shen, Jun; Zheng, Huicheng
2001-09-01
Fingerprint recognition is an important subject in biometrics to identify or verify persons by physiological characteristics, and has found wide applications in different domains. In the present paper, we present a finger recognition system that combines singular points and structures. The principal steps of processing in our system are: preprocessing and ridge segmentation, singular point extraction and selection, graph representation, and finger recognition by graphs matching. Our fingerprint recognition system is implemented and tested for many fingerprint images and the experimental result are satisfactory. Different techniques are used in our system, such as fast calculation of orientation field, local fuzzy dynamical thresholding, algebraic analysis of connections and fingerprints representation and matching by graphs. Wed find that for fingerprint database that is not very large, the recognition rate is very high even without using a prior coarse category classification. This system works well for both one-to-few and one-to-many problems.
2005-06-01
regarding criminals among many police departments. The criminal graph links suspects, crimes , locations, previous case histories, etc. These linkages...rebroadcast rate is high enough that dying out is not a concern. With the rise of sensor and peer to peer networks characterized by high churn, theory that...document groups (say, science fiction novels and thrillers ), based on the word groups that occur most frequently in them. A user who prefers one
Scalable Matrix Algorithms for Interactive Analytics of Very Large Informatics Graphs
2017-06-14
information networks. Depending on the situation, these larger networks may not fit on a single machine. Although we considered traditional matrix and graph...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...gathering and maintaining the data needed, and completing and reviewing the collection of information . Send comments regarding this burden estimate or
Optimizing spread dynamics on graphs by message passing
NASA Astrophysics Data System (ADS)
Altarelli, F.; Braunstein, A.; Dall'Asta, L.; Zecchina, R.
2013-09-01
Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the past decades, much effort has been devoted to understanding the typical behavior of the cascades arising from initial conditions extracted at random from some given ensemble. However, the problem of optimizing the trajectory of the system, i.e. of identifying appropriate initial conditions to maximize (or minimize) the final number of active nodes, is still considered to be practically intractable, with the only exception being models that satisfy a sort of diminishing returns property called submodularity. Submodular models can be approximately solved by means of greedy strategies, but by definition they lack cooperative characteristics which are fundamental in many real systems. Here we introduce an efficient algorithm based on statistical physics for the optimization of trajectories in cascade processes on graphs. We show that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Analytic and algorithmic results on random graphs are complemented by the solution of the spread maximization problem on a real-world network (the Epinions consumer reviews network).
A simple rule for the evolution of cooperation on graphs and social networks.
Ohtsuki, Hisashi; Hauert, Christoph; Lieberman, Erez; Nowak, Martin A
2006-05-25
A fundamental aspect of all biological systems is cooperation. Cooperative interactions are required for many levels of biological organization ranging from single cells to groups of animals. Human society is based to a large extent on mechanisms that promote cooperation. It is well known that in unstructured populations, natural selection favours defectors over cooperators. There is much current interest, however, in studying evolutionary games in structured populations and on graphs. These efforts recognize the fact that who-meets-whom is not random, but determined by spatial relationships or social networks. Here we describe a surprisingly simple rule that is a good approximation for all graphs that we have analysed, including cycles, spatial lattices, random regular graphs, random graphs and scale-free networks: natural selection favours cooperation, if the benefit of the altruistic act, b, divided by the cost, c, exceeds the average number of neighbours, k, which means b/c > k. In this case, cooperation can evolve as a consequence of 'social viscosity' even in the absence of reputation effects or strategic complexity.
Breaking of Ensemble Equivalence in Networks
NASA Astrophysics Data System (ADS)
Squartini, Tiziano; de Mol, Joey; den Hollander, Frank; Garlaschelli, Diego
2015-12-01
It is generally believed that, in the thermodynamic limit, the microcanonical description as a function of energy coincides with the canonical description as a function of temperature. However, various examples of systems for which the microcanonical and canonical ensembles are not equivalent have been identified. A complete theory of this intriguing phenomenon is still missing. Here we show that ensemble nonequivalence can manifest itself also in random graphs with topological constraints. We find that, while graphs with a given number of links are ensemble equivalent, graphs with a given degree sequence are not. This result holds irrespective of whether the energy is nonadditive (as in unipartite graphs) or additive (as in bipartite graphs). In contrast with previous expectations, our results show that (1) physically, nonequivalence can be induced by an extensive number of local constraints, and not necessarily by long-range interactions or nonadditivity, (2) mathematically, nonequivalence is determined by a different large-deviation behavior of microcanonical and canonical probabilities for a single microstate, and not necessarily for almost all microstates. The latter criterion, which is entirely local, is not restricted to networks and holds in general.
NASA Astrophysics Data System (ADS)
Haritan, Idan; Moiseyev, Nimrod
2017-07-01
Resonances play a major role in a large variety of fields in physics and chemistry. Accordingly, there is a growing interest in methods designed to calculate them. Recently, Landau et al. proposed a new approach to analytically dilate a single eigenvalue from the stabilization graph into the complex plane. This approach, termed Resonances Via Padé (RVP), utilizes the Padé approximant and is based on a unique analysis of the stabilization graph. Yet, analytic continuation of eigenvalues from the stabilization graph into the complex plane is not a new idea. In 1975, Jordan suggested an analytic continuation method based on the branch point structure of the stabilization graph. The method was later modified by McCurdy and McNutt, and it is still being used today. We refer to this method as the Truncated Characteristic Polynomial (TCP) method. In this manuscript, we perform an in-depth comparison between the RVP and the TCP methods. We demonstrate that while both methods are important and complementary, the advantage of one method over the other is problem-dependent. Illustrative examples are provided in the manuscript.
Application of the PageRank Algorithm to Alarm Graphs
NASA Astrophysics Data System (ADS)
Treinen, James J.; Thurimella, Ramakrishna
The task of separating genuine attacks from false alarms in large intrusion detection infrastructures is extremely difficult. The number of alarms received in such environments can easily enter into the millions of alerts per day. The overwhelming noise created by these alarms can cause genuine attacks to go unnoticed. As means of highlighting these attacks, we introduce a host ranking technique utilizing Alarm Graphs. Rather than enumerate all potential attack paths as in Attack Graphs, we build and analyze graphs based on the alarms generated by the intrusion detection sensors installed on a network. Given that the alarms are predominantly false positives, the challenge is to identify, separate, and ideally predict future attacks. In this paper, we propose a novel approach to tackle this problem based on the PageRank algorithm. By elevating the rank of known attackers and victims we are able to observe the effect that these hosts have on the other nodes in the Alarm Graph. Using this information we are able to discover previously overlooked attacks, as well as defend against future intrusions.
Mosaic Graphs and Comparative Genomics in Phage Communities
Belcaid, Mahdi; Bergeron, Anne
2010-01-01
Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413
Efficient path-based computations on pedigree graphs with compact encodings
2012-01-01
A pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. Along with rapidly growing knowledge of genetics and accumulation of genealogy information, pedigree data is becoming increasingly important. In large pedigree graphs, path-based methods for efficiently computing genealogical measurements, such as inbreeding and kinship coefficients of individuals, depend on efficient identification and processing of paths. In this paper, we propose a new compact path encoding scheme on large pedigrees, accompanied by an efficient algorithm for identifying paths. We demonstrate the utilization of our proposed method by applying it to the inbreeding coefficient computation. We present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements. PMID:22536898
Adaptive random walks on the class of Web graphs
NASA Astrophysics Data System (ADS)
Tadić, B.
2001-09-01
We study random walk with adaptive move strategies on a class of directed graphs with variable wiring diagram. The graphs are grown from the evolution rules compatible with the dynamics of the world-wide Web [B. Tadić, Physica A 293, 273 (2001)], and are characterized by a pair of power-law distributions of out- and in-degree for each value of the parameter β, which measures the degree of rewiring in the graph. The walker adapts its move strategy according to locally available information both on out-degree of the visited node and in-degree of target node. A standard random walk, on the other hand, uses the out-degree only. We compute the distribution of connected subgraphs visited by an ensemble of walkers, the average access time and survival probability of the walks. We discuss these properties of the walk dynamics relative to the changes in the global graph structure when the control parameter β is varied. For β≥ 3, corresponding to the world-wide Web, the access time of the walk to a given level of hierarchy on the graph is much shorter compared to the standard random walk on the same graph. By reducing the amount of rewiring towards rigidity limit β↦βc≲ 0.1, corresponding to the range of naturally occurring biochemical networks, the survival probability of adaptive and standard random walk become increasingly similar. The adaptive random walk can be used as an efficient message-passing algorithm on this class of graphs for large degree of rewiring.
On understanding nuclear reaction network flows with branchings on directed graphs
NASA Astrophysics Data System (ADS)
Meyer, Bradley S.
2018-04-01
Nuclear reaction network flow diagrams are useful for understanding which reactions are governing the abundance changes at a particular time during nucleosynthesis. This is especially true when the flows are largely unidirectional, such as during the s-process of nucleosynthesis. In explosive nucleosynthesis, when reaction flows are large, and when forward reactions are nearly balanced by their reverses, reaction flows no longer give a clear picture of the abundance evolution in the network. This paper presents a way of understanding network evolution in terms of sums of branchings on a directed graph, which extends the concept of reaction flows to allow for multiple reaction pathways.
Modeling flow and transport in fracture networks using graphs
NASA Astrophysics Data System (ADS)
Karra, S.; O'Malley, D.; Hyman, J. D.; Viswanathan, H. S.; Srinivasan, G.
2018-03-01
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizations of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. The good accuracy and the low computational cost, with O (104) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.
Modeling flow and transport in fracture networks using graphs.
Karra, S; O'Malley, D; Hyman, J D; Viswanathan, H S; Srinivasan, G
2018-03-01
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizations of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. The good accuracy and the low computational cost, with O(10^{4}) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.
Modeling flow and transport in fracture networks using graphs
Karra, S.; O'Malley, D.; Hyman, J. D.; ...
2018-03-09
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizationsmore » of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. In conclusion, the good accuracy and the low computational cost, with O(10 4) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.« less
Modeling flow and transport in fracture networks using graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karra, S.; O'Malley, D.; Hyman, J. D.
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizationsmore » of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. In conclusion, the good accuracy and the low computational cost, with O(10 4) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.« less
Acceleration of Binding Site Comparisons by Graph Partitioning.
Krotzky, Timo; Klebe, Gerhard
2015-08-01
The comparison of protein binding sites is a prominent task in computational chemistry and has been studied in many different ways. For the automatic detection and comparison of putative binding cavities the Cavbase system has been developed which uses a coarse-grained set of pseudocenters to represent the physicochemical properties of a binding site and employs a graph-based procedure to calculate similarities between two binding sites. However, the comparison of two graphs is computationally quite demanding which makes large-scale studies such as the rapid screening of entire databases hardly feasible. In a recent work, we proposed the method Local Cliques (LC) for the efficient comparison of Cavbase binding sites. It employs a clique heuristic to detect the maximum common subgraph of two binding sites and an extended graph model to additionally compare the shape of individual surface patches. In this study, we present an alternative to further accelerate the LC method by partitioning the binding-site graphs into disjoint components prior to their comparisons. The pseudocenter sets are split with regard to their assigned phyiscochemical type, which leads to seven much smaller graphs than the original one. Applying this approach on the same test scenarios as in the former comprehensive way results in a significant speed-up without sacrificing accuracy. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Visual Exploratory Search of Relationship Graphs on Smartphones
Ouyang, Jianquan; Zheng, Hao; Kong, Fanbin; Liu, Tianming
2013-01-01
This paper presents a novel framework for Visual Exploratory Search of Relationship Graphs on Smartphones (VESRGS) that is composed of three major components: inference and representation of semantic relationship graphs on the Web via meta-search, visual exploratory search of relationship graphs through both querying and browsing strategies, and human-computer interactions via the multi-touch interface and mobile Internet on smartphones. In comparison with traditional lookup search methodologies, the proposed VESRGS system is characterized with the following perceived advantages. 1) It infers rich semantic relationships between the querying keywords and other related concepts from large-scale meta-search results from Google, Yahoo! and Bing search engines, and represents semantic relationships via graphs; 2) the exploratory search approach empowers users to naturally and effectively explore, adventure and discover knowledge in a rich information world of interlinked relationship graphs in a personalized fashion; 3) it effectively takes the advantages of smartphones’ user-friendly interfaces and ubiquitous Internet connection and portability. Our extensive experimental results have demonstrated that the VESRGS framework can significantly improve the users’ capability of seeking the most relevant relationship information to their own specific needs. We envision that the VESRGS framework can be a starting point for future exploration of novel, effective search strategies in the mobile Internet era. PMID:24223936
Scaling Up Graph-Based Semisupervised Learning via Prototype Vector Machines
Zhang, Kai; Lan, Liang; Kwok, James T.; Vucetic, Slobodan; Parvin, Bahram
2014-01-01
When the amount of labeled data are limited, semi-supervised learning can improve the learner's performance by also using the often easily available unlabeled data. In particular, a popular approach requires the learned function to be smooth on the underlying data manifold. By approximating this manifold as a weighted graph, such graph-based techniques can often achieve state-of-the-art performance. However, their high time and space complexities make them less attractive on large data sets. In this paper, we propose to scale up graph-based semisupervised learning using a set of sparse prototypes derived from the data. These prototypes serve as a small set of data representatives, which can be used to approximate the graph-based regularizer and to control model complexity. Consequently, both training and testing become much more efficient. Moreover, when the Gaussian kernel is used to define the graph affinity, a simple and principled method to select the prototypes can be obtained. Experiments on a number of real-world data sets demonstrate encouraging performance and scaling properties of the proposed approach. It also compares favorably with models learned via ℓ1-regularization at the same level of model sparsity. These results demonstrate the efficacy of the proposed approach in producing highly parsimonious and accurate models for semisupervised learning. PMID:25720002
The Full Ward-Takahashi Identity for Colored Tensor Models
NASA Astrophysics Data System (ADS)
Pérez-Sánchez, Carlos I.
2018-03-01
Colored tensor models (CTM) is a random geometrical approach to quantum gravity. We scrutinize the structure of the connected correlation functions of general CTM-interactions and organize them by boundaries of Feynman graphs. For rank- D interactions including, but not restricted to, all melonic φ^4 -vertices—to wit, solely those quartic vertices that can lead to dominant spherical contributions in the large- N expansion—the aforementioned boundary graphs are shown to be precisely all (possibly disconnected) vertex-bipartite regularly edge- D-colored graphs. The concept of CTM-compatible boundary-graph automorphism is introduced and an auxiliary graph calculus is developed. With the aid of these constructs, certain U (∞)-invariance of the path integral measure is fully exploited in order to derive a strong Ward-Takahashi Identity for CTMs with a symmetry-breaking kinetic term. For the rank-3 φ^4 -theory, we get the exact integral-like equation for the 2-point function. Similarly, exact equations for higher multipoint functions can be readily obtained departing from this full Ward-Takahashi identity. Our results hold for some Group Field Theories as well. Altogether, our non-perturbative approach trades some graph theoretical methods for analytical ones. We believe that these tools can be extended to tensorial SYK-models.
Visualization of Morse connection graphs for topologically rich 2D vector fields.
Szymczak, Andrzej; Sipeki, Levente
2013-12-01
Recent advances in vector field topologymake it possible to compute its multi-scale graph representations for autonomous 2D vector fields in a robust and efficient manner. One of these representations is a Morse Connection Graph (MCG), a directed graph whose nodes correspond to Morse sets, generalizing stationary points and periodic trajectories, and arcs - to trajectories connecting them. While being useful for simple vector fields, the MCG can be hard to comprehend for topologically rich vector fields, containing a large number of features. This paper describes a visual representation of the MCG, inspired by previous work on graph visualization. Our approach aims to preserve the spatial relationships between the MCG arcs and nodes and highlight the coherent behavior of connecting trajectories. Using simulations of ocean flow, we show that it can provide useful information on the flow structure. This paper focuses specifically on MCGs computed for piecewise constant (PC) vector fields. In particular, we describe extensions of the PC framework that make it more flexible and better suited for analysis of data on complex shaped domains with a boundary. We also describe a topology simplification scheme that makes our MCG visualizations less ambiguous. Despite the focus on the PC framework, our approach could also be applied to graph representations or topological skeletons computed using different methods.
The asymptotics of large constrained graphs
NASA Astrophysics Data System (ADS)
Radin, Charles; Ren, Kui; Sadun, Lorenzo
2014-05-01
We show, through local estimates and simulation, that if one constrains simple graphs by their densities ɛ of edges and τ of triangles, then asymptotically (in the number of vertices) for over 95% of the possible range of those densities there is a well-defined typical graph, and it has a very simple structure: the vertices are decomposed into two subsets V1 and V2 of fixed relative size c and 1 - c, and there are well-defined probabilities of edges, gjk, between vj ∈ Vj, and vk ∈ Vk. Furthermore the four parameters c, g11, g22 and g12 are smooth functions of (ɛ, τ) except at two smooth ‘phase transition’ curves.
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sczyrba, Alex; Pratap, Abhishek; Canon, Shane
2011-03-22
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
NASA Astrophysics Data System (ADS)
Mandrà, Salvatore; Giacomo Guerreschi, Gian; Aspuru-Guzik, Alán
2016-07-01
We present an exact quantum algorithm for solving the Exact Satisfiability problem, which belongs to the important NP-complete complexity class. The algorithm is based on an intuitive approach that can be divided into two parts: the first step consists in the identification and efficient characterization of a restricted subspace that contains all the valid assignments of the Exact Satisfiability; while the second part performs a quantum search in such restricted subspace. The quantum algorithm can be used either to find a valid assignment (or to certify that no solution exists) or to count the total number of valid assignments. The query complexities for the worst-case are respectively bounded by O(\\sqrt{{2}n-{M\\prime }}) and O({2}n-{M\\prime }), where n is the number of variables and {M}\\prime the number of linearly independent clauses. Remarkably, the proposed quantum algorithm results to be faster than any known exact classical algorithm to solve dense formulas of Exact Satisfiability. As a concrete application, we provide the worst-case complexity for the Hamiltonian cycle problem obtained after mapping it to a suitable Occupation problem. Specifically, we show that the time complexity for the proposed quantum algorithm is bounded by O({2}n/4) for 3-regular undirected graphs, where n is the number of nodes. The same worst-case complexity holds for (3,3)-regular bipartite graphs. As a reference, the current best classical algorithm has a (worst-case) running time bounded by O({2}31n/96). Finally, when compared to heuristic techniques for Exact Satisfiability problems, the proposed quantum algorithm is faster than the classical WalkSAT and Adiabatic Quantum Optimization for random instances with a density of constraints close to the satisfiability threshold, the regime in which instances are typically the hardest to solve. The proposed quantum algorithm can be straightforwardly extended to the generalized version of the Exact Satisfiability known as Occupation problem. The general version of the algorithm is presented and analyzed.
Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems
NASA Technical Reports Server (NTRS)
Naik, Vijay K.; Patrick, Merrell L.
1989-01-01
Communication requirements of Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The communication requirement is characterized by the data traffic generated on multiprocessor systems with local and shared memory. Lower bound proofs are given to show that when the load is uniformly distributed the data traffic associated with factoring an n x n dense matrix using n to the alpha power (alpha less than or equal 2) processors is omega(n to the 2 + alpha/2 power). For n x n sparse matrices representing a square root of n x square root of n regular grid graph the data traffic is shown to be omega(n to the 1 + alpha/2 power), alpha less than or equal 1. Partitioning schemes that are variations of block assignment scheme are described and it is shown that the data traffic generated by these schemes are asymptotically optimal. The schemes allow efficient use of up to O(n to the 2nd power) processors in the dense case and up to O(n) processors in the sparse case before the total data traffic reaches the maximum value of O(n to the 3rd power) and O(n to the 3/2 power), respectively. It is shown that the block based partitioning schemes allow a better utilization of the data accessed from shared memory and thus reduce the data traffic than those based on column-wise wrap around assignment schemes.
Unsupervised Metric Fusion Over Multiview Data by Graph Random Walk-Based Cross-View Diffusion.
Wang, Yang; Zhang, Wenjie; Wu, Lin; Lin, Xuemin; Zhao, Xiang
2017-01-01
Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.
Scale-free characteristics of random networks: the topology of the world-wide web
NASA Astrophysics Data System (ADS)
Barabási, Albert-László; Albert, Réka; Jeong, Hawoong
2000-06-01
The world-wide web forms a large directed graph, whose vertices are documents and edges are links pointing from one document to another. Here we demonstrate that despite its apparent random character, the topology of this graph has a number of universal scale-free characteristics. We introduce a model that leads to a scale-free network, capturing in a minimal fashion the self-organization processes governing the world-wide web.
Data Mining Meets HCI: Making Sense of Large Graphs
2012-07-01
graph algo- rithms, won the Open Source Software World Challenge, Silver Award. We have released Pegasus as free , open-source software, downloaded by...METIS [77], spectral clustering [108], and the parameter- free “Cross-associations” (CA) [26]. Belief Propagation can also be used for clus- tering, as...number of tools have been developed to support “ landscape ” views of information. These include WebBook and Web- Forager [23], which use a book metaphor
An efficient randomized algorithm for contact-based NMR backbone resonance assignment.
Kamisetty, Hetunandan; Bailey-Kellogg, Chris; Pandurangan, Gopal
2006-01-15
Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise. This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions. Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.
NASA Astrophysics Data System (ADS)
Banesh, D.; Oskin, M. E.; Mu, A.; Vu, C.; Westerteiger, R.; Krishnan, A.; Hamann, B.; Glennie, C. L.; Hinojosa, A.; Borsa, A. A.
2013-12-01
Differential LiDAR provides unprecedented images of the near-field ground deformation and fault slip due to earthquakes. Here we examine the performance of the Iterative Closest Point (ICP) technique for data registration between pre- and post-earthquake LiDAR point clouds of varying density. We use the 2010 El Mayor-Cucapah data set as our region of interest since this earthquake produced different types of surface ruptures, yielding a variety of deformation styles for analysis. We also test a more simplistic, Chi-Squared minimization approach and find that it produces good results when compared to ICP. We present different techniques for visualizing large vector fields, and show how each method highlights a unique feature in the data set. Dense vector fields are useful when analyzing smaller deformations in the surface. A sparse, averaged vector field analyzes the bigger, overall shifts without interference caused by small details. Flow-based visualizations like Line Integral Convolution (LIC) graphs, provide insight into particular artifacts of data collection, such as distortions due to uncorrected pitch and yaw of the aircraft during the survey. Animations of the vector field establish the direction of movement in the landscape, quickly highlighting areas of interest.
Hummer, Blake H.; de Leeuw, Noah F.; Burns, Christian; Chen, Lan; Joens, Matthew S.; Hosford, Bethany; Fitzpatrick, James A. J.; Asensio, Cedric S.
2017-01-01
Large dense core vesicles (LDCVs) mediate the regulated release of neuropeptides and peptide hormones. They form at the trans-Golgi network (TGN), where their soluble content aggregates to form a dense core, but the mechanisms controlling biogenesis are still not completely understood. Recent studies have implicated the peripheral membrane protein HID-1 in neuropeptide sorting and insulin secretion. Using CRISPR/Cas9, we generated HID-1 KO rat neuroendocrine cells, and we show that the absence of HID-1 results in specific defects in peptide hormone and monoamine storage and regulated secretion. Loss of HID-1 causes a reduction in the number of LDCVs and affects their morphology and biochemical properties, due to impaired cargo sorting and dense core formation. HID-1 KO cells also exhibit defects in TGN acidification together with mislocalization of the Golgi-enriched vacuolar H+-ATPase subunit isoform a2. We propose that HID-1 influences early steps in LDCV formation by controlling dense core formation at the TGN. PMID:29074564
Sharma, Harshita; Alekseychuk, Alexander; Leskovsky, Peter; Hellwich, Olaf; Anand, R S; Zerbe, Norman; Hufnagl, Peter
2012-10-04
Computer-based analysis of digitalized histological images has been gaining increasing attention, due to their extensive use in research and routine practice. The article aims to contribute towards the description and retrieval of histological images by employing a structural method using graphs. Due to their expressive ability, graphs are considered as a powerful and versatile representation formalism and have obtained a growing consideration especially by the image processing and computer vision community. The article describes a novel method for determining similarity between histological images through graph-theoretic description and matching, for the purpose of content-based retrieval. A higher order (region-based) graph-based representation of breast biopsy images has been attained and a tree-search based inexact graph matching technique has been employed that facilitates the automatic retrieval of images structurally similar to a given image from large databases. The results obtained and evaluation performed demonstrate the effectiveness and superiority of graph-based image retrieval over a common histogram-based technique. The employed graph matching complexity has been reduced compared to the state-of-the-art optimal inexact matching methods by applying a pre-requisite criterion for matching of nodes and a sophisticated design of the estimation function, especially the prognosis function. The proposed method is suitable for the retrieval of similar histological images, as suggested by the experimental and evaluation results obtained in the study. It is intended for the use in Content Based Image Retrieval (CBIR)-requiring applications in the areas of medical diagnostics and research, and can also be generalized for retrieval of different types of complex images. The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1224798882787923.
Local adjacency metric dimension of sun graph and stacked book graph
NASA Astrophysics Data System (ADS)
Yulisda Badri, Alifiah; Darmaji
2018-03-01
A graph is a mathematical system consisting of a non-empty set of nodes and a set of empty sides. One of the topics to be studied in graph theory is the metric dimension. Application in the metric dimension is the navigation robot system on a path. Robot moves from one vertex to another vertex in the field by minimizing the errors that occur in translating the instructions (code) obtained from the vertices of that location. To move the robot must give different instructions (code). In order for the robot to move efficiently, the robot must be fast to translate the code of the nodes of the location it passes. so that the location vertex has a minimum distance. However, if the robot must move with the vertex location on a very large field, so the robot can not detect because the distance is too far.[6] In this case, the robot can determine its position by utilizing location vertices based on adjacency. The problem is to find the minimum cardinality of the required location vertex, and where to put, so that the robot can determine its location. The solution to this problem is the dimension of adjacency metric and adjacency metric bases. Rodrguez-Velzquez and Fernau combine the adjacency metric dimensions with local metric dimensions, thus becoming the local adjacency metric dimension. In the local adjacency metric dimension each vertex in the graph may have the same adjacency representation as the terms of the vertices. To obtain the local metric dimension of values in the graph of the Sun and the stacked book graph is used the construction method by considering the representation of each adjacent vertex of the graph.
2012-01-01
Background Computer-based analysis of digitalized histological images has been gaining increasing attention, due to their extensive use in research and routine practice. The article aims to contribute towards the description and retrieval of histological images by employing a structural method using graphs. Due to their expressive ability, graphs are considered as a powerful and versatile representation formalism and have obtained a growing consideration especially by the image processing and computer vision community. Methods The article describes a novel method for determining similarity between histological images through graph-theoretic description and matching, for the purpose of content-based retrieval. A higher order (region-based) graph-based representation of breast biopsy images has been attained and a tree-search based inexact graph matching technique has been employed that facilitates the automatic retrieval of images structurally similar to a given image from large databases. Results The results obtained and evaluation performed demonstrate the effectiveness and superiority of graph-based image retrieval over a common histogram-based technique. The employed graph matching complexity has been reduced compared to the state-of-the-art optimal inexact matching methods by applying a pre-requisite criterion for matching of nodes and a sophisticated design of the estimation function, especially the prognosis function. Conclusion The proposed method is suitable for the retrieval of similar histological images, as suggested by the experimental and evaluation results obtained in the study. It is intended for the use in Content Based Image Retrieval (CBIR)-requiring applications in the areas of medical diagnostics and research, and can also be generalized for retrieval of different types of complex images. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1224798882787923. PMID:23035717
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
Song, Fengguang; Dongarra, Jack
2014-10-01
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Fengguang; Dongarra, Jack
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
A novel sub-shot segmentation method for user-generated video
NASA Astrophysics Data System (ADS)
Lei, Zhuo; Zhang, Qian; Zheng, Chi; Qiu, Guoping
2018-04-01
With the proliferation of the user-generated videos, temporal segmentation is becoming a challengeable problem. Traditional video temporal segmentation methods like shot detection are not able to work on unedited user-generated videos, since they often only contain one single long shot. We propose a novel temporal segmentation framework for user-generated video. It finds similar frames with a tree partitioning min-Hash technique, constructs sparse temporal constrained affinity sub-graphs, and finally divides the video into sub-shot-level segments with a dense-neighbor-based clustering method. Experimental results show that our approach outperforms all the other related works. Furthermore, it is indicated that the proposed approach is able to segment user-generated videos at an average human level.
NASA Astrophysics Data System (ADS)
Piccolo, M. D.; Martins, S. C.; Camargo, P. B.; Almeida, D. Q.; Correa, L. O.; Carmo, J. B.; Martinelli, L. A.
2009-12-01
The Brazilian Atlantic forest is a vast heterogeneous region with 1.5 million km2, encompassing a large variety of forest physiognomies and compositions, containing large number of species. These forests are distributed in different topographic and climatic conditions, with high levels of precipitation. The rate of deforestation is high, approaching 350 km2 per year, showing be highly fragmented with a large number of species in extinction. The aim of this study was to understanding of the basic biogeochemistry functioning of the coastal Atlantic Forest. The study was carried out in São Paulo State, Brazil (23° 24' S and 45° 11' W). The studied areas were: Restinga Forest at sea level; Lowland Ombrophylus Dense Forest at 100m of altitude asl; Submontana Ombrophylus Dense Forest at 400m of altitude asl and; Montane Ombrophylus Dense Forest at 1000m of altitude asl. A sampling area of 1 ha in each phytophysiognomies was subdivided in contiguous sub-parcels (10 x 10m). The forest floor litter accumulated (0.06m2) was collected monthly (n=15), during 12 months, in each phytophysiognomies. Soils samples (0-0.05m depth) were collected (n=32) from square regular grids, 30m away from each other. Techniques of multivariate like principal components analysis (PCA) were used to determine correlations between the variable. The ordination graphs make possible to observe frequent of standards, representing a significant ratio of the variability of the data. The two first PCA axes cumulatively explained 60% of the total variance of the litter variables. Litter C and δ13C values were strongly influenced by altitude at 1000m. The N and δ15N of litter were influenced by altitude at 100 and 400m. The C/N relation was influenced by altitude at 0m. The lignin was elevated (p<0.01) at sea level in comparison with the other phytophysiognomies. The cellulose values did not vary significantly along the altitudinal gradient. Soil C and N concentrations progressively increased along the altitudinal gradient (p<0.01). Soil C:N ratio was significantly higher at 0m. The two first PCA axes cumulatively explained 76% of the total variance of the soil variables. Soil C and N contents were strongly influenced by altitude at 1000m and the C/N relation and sand were influenced by altitude 0m. It is also possible to observe that C, N and clay contents are inversely proportional to the soil sand fraction. This study showed that analysis of the principal components was useful in the grouping areas. The factors that explain differences along this gradient are not yet well known, but topography and the microclimate are appears to be the two main candidate factors regulating this nutritional variation of litter in the gradient.
Graph Theoretical Analysis Reveals: Women's Brains Are Better Connected than Men's.
Szalkai, Balázs; Varga, Bálint; Grolmusz, Vince
2015-01-01
Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Google's PageRank and the subsequent rise of the most popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide Web, will lead to discoveries enlightening the structural and also the functional details of the animal and human brains. When scientists examine large networks of tens or hundreds of millions of vertices, only fast algorithms can be applied because of the size constraints. In the case of diffusion MRI-based structural human brain imaging, the effective vertex number of the connectomes, or brain graphs derived from the data is on the scale of several hundred today. That size facilitates applying strict mathematical graph algorithms even for some hard-to-compute (or NP-hard) quantities like vertex cover or balanced minimum cut. In the present work we have examined brain graphs, computed from the data of the Human Connectome Project, recorded from male and female subjects between ages 22 and 35. Significant differences were found between the male and female structural brain graphs: we show that the average female connectome has more edges, is a better expander graph, has larger minimal bisection width, and has more spanning trees than the average male connectome. Since the average female brain weighs less than the brain of males, these properties show that the female brain has better graph theoretical properties, in a sense, than the brain of males. It is known that the female brain has a smaller gray matter/white matter ratio than males, that is, a larger white matter/gray matter ratio than the brain of males; this observation is in line with our findings concerning the number of edges, since the white matter consists of myelinated axons, which, in turn, roughly correspond to the connections in the brain graph. We have also found that the minimum bisection width, normalized with the edge number, is also significantly larger in the right and the left hemispheres in females: therefore, the differing bisection widths are independent from the difference in the number of edges.
Graph Theoretical Analysis Reveals: Women’s Brains Are Better Connected than Men’s
Szalkai, Balázs; Varga, Bálint; Grolmusz, Vince
2015-01-01
Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Google’s PageRank and the subsequent rise of the most popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide Web, will lead to discoveries enlightening the structural and also the functional details of the animal and human brains. When scientists examine large networks of tens or hundreds of millions of vertices, only fast algorithms can be applied because of the size constraints. In the case of diffusion MRI-based structural human brain imaging, the effective vertex number of the connectomes, or brain graphs derived from the data is on the scale of several hundred today. That size facilitates applying strict mathematical graph algorithms even for some hard-to-compute (or NP-hard) quantities like vertex cover or balanced minimum cut. In the present work we have examined brain graphs, computed from the data of the Human Connectome Project, recorded from male and female subjects between ages 22 and 35. Significant differences were found between the male and female structural brain graphs: we show that the average female connectome has more edges, is a better expander graph, has larger minimal bisection width, and has more spanning trees than the average male connectome. Since the average female brain weighs less than the brain of males, these properties show that the female brain has better graph theoretical properties, in a sense, than the brain of males. It is known that the female brain has a smaller gray matter/white matter ratio than males, that is, a larger white matter/gray matter ratio than the brain of males; this observation is in line with our findings concerning the number of edges, since the white matter consists of myelinated axons, which, in turn, roughly correspond to the connections in the brain graph. We have also found that the minimum bisection width, normalized with the edge number, is also significantly larger in the right and the left hemispheres in females: therefore, the differing bisection widths are independent from the difference in the number of edges. PMID:26132764
Dense Gas, Dynamical Equilibrium Pressure, and Star Formation in Nearby Star-forming Galaxies
NASA Astrophysics Data System (ADS)
Gallagher, Molly J.; Leroy, Adam K.; Bigiel, Frank; Cormier, Diane; Jiménez-Donaire, María J.; Ostriker, Eve; Usero, Antonio; Bolatto, Alberto D.; García-Burillo, Santiago; Hughes, Annie; Kepley, Amanda A.; Krumholz, Mark; Meidt, Sharon E.; Meier, David S.; Murphy, Eric J.; Pety, Jérôme; Rosolowsky, Erik; Schinnerer, Eva; Schruba, Andreas; Walter, Fabian
2018-05-01
We use new ALMA observations to investigate the connection between dense gas fraction, star formation rate (SFR), and local environment across the inner region of four local galaxies showing a wide range of molecular gas depletion times. We map HCN (1–0), HCO+ (1–0), CS (2–1), 13CO (1–0), and C18O (1–0) across the inner few kiloparsecs of each target. We combine these data with short-spacing information from the IRAM large program EMPIRE, archival CO maps, tracers of stellar structure and recent star formation, and recent HCN surveys by Bigiel et al. and Usero et al. We test the degree to which changes in the dense gas fraction drive changes in the SFR. {I}HCN}/{I}CO} (tracing the dense gas fraction) correlates strongly with I CO (tracing molecular gas surface density), stellar surface density, and dynamical equilibrium pressure, P DE. Therefore, {I}HCN}/{I}CO} becomes very low and HCN becomes very faint at large galactocentric radii, where ratios as low as {I}HCN}/{I}CO}∼ 0.01 become common. The apparent ability of dense gas to form stars, {{{Σ }}}SFR}/{{{Σ }}}dense} (where Σdense is traced by the HCN intensity and the star formation rate is traced by a combination of Hα and 24 μm emission), also depends on environment. {{{Σ }}}SFR}/{{{Σ }}}dense} decreases in regions of high gas surface density, high stellar surface density, and high P DE. Statistically, these correlations between environment and both {{{Σ }}}SFR}/{{{Σ }}}dense} and {I}HCN}/{I}CO} are stronger than that between apparent dense gas fraction ({I}HCN}/{I}CO}) and the apparent molecular gas star formation efficiency {{{Σ }}}SFR}/{{{Σ }}}mol}. We show that these results are not specific to HCN.
Heuristic-driven graph wavelet modeling of complex terrain
NASA Astrophysics Data System (ADS)
Cioacǎ, Teodor; Dumitrescu, Bogdan; Stupariu, Mihai-Sorin; Pǎtru-Stupariu, Ileana; Nǎpǎrus, Magdalena; Stoicescu, Ioana; Peringer, Alexander; Buttler, Alexandre; Golay, François
2015-03-01
We present a novel method for building a multi-resolution representation of large digital surface models. The surface points coincide with the nodes of a planar graph which can be processed using a critically sampled, invertible lifting scheme. To drive the lazy wavelet node partitioning, we employ an attribute aware cost function based on the generalized quadric error metric. The resulting algorithm can be applied to multivariate data by storing additional attributes at the graph's nodes. We discuss how the cost computation mechanism can be coupled with the lifting scheme and examine the results by evaluating the root mean square error. The algorithm is experimentally tested using two multivariate LiDAR sets representing terrain surface and vegetation structure with different sampling densities.
Parasol: An Architecture for Cross-Cloud Federated Graph Querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa
2014-06-22
Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less
Identifying Vulnerabilities and Hardening Attack Graphs for Networked Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saha, Sudip; Vullinati, Anil K.; Halappanavar, Mahantesh
We investigate efficient security control methods for protecting against vulnerabilities in networked systems. A large number of interdependent vulnerabilities typically exist in the computing nodes of a cyber-system; as vulnerabilities get exploited, starting from low level ones, they open up the doors to more critical vulnerabilities. These cannot be understood just by a topological analysis of the network, and we use the attack graph abstraction of Dewri et al. to study these problems. In contrast to earlier approaches based on heuristics and evolutionary algorithms, we study rigorous methods for quantifying the inherent vulnerability and hardening cost for the system. Wemore » develop algorithms with provable approximation guarantees, and evaluate them for real and synthetic attack graphs.« less
Exactly solvable random graph ensemble with extensively many short cycles
NASA Astrophysics Data System (ADS)
Aguirre López, Fabián; Barucca, Paolo; Fekom, Mathilde; Coolen, Anthony C. C.
2018-02-01
We introduce and analyse ensembles of 2-regular random graphs with a tuneable distribution of short cycles. The phenomenology of these graphs depends critically on the scaling of the ensembles’ control parameters relative to the number of nodes. A phase diagram is presented, showing a second order phase transition from a connected to a disconnected phase. We study both the canonical formulation, where the size is large but fixed, and the grand canonical formulation, where the size is sampled from a discrete distribution, and show their equivalence in the thermodynamical limit. We also compute analytically the spectral density, which consists of a discrete set of isolated eigenvalues, representing short cycles, and a continuous part, representing cycles of diverging size.
Sun, Peng; Guo, Jiong; Baumbach, Jan
2012-07-17
The explosion of biological data has largely influenced the focus of today’s biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
Sun, Peng; Guo, Jiong; Baumbach, Jan
2012-06-01
The explosion of biological data has largely influenced the focus of today's biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
Law of large numbers for the SIR model with random vertex weights on Erdős-Rényi graph
NASA Astrophysics Data System (ADS)
Xue, Xiaofeng
2017-11-01
In this paper we are concerned with the SIR model with random vertex weights on Erdős-Rényi graph G(n , p) . The Erdős-Rényi graph G(n , p) is generated from the complete graph Cn with n vertices through independently deleting each edge with probability (1 - p) . We assign i. i. d. copies of a positive r. v. ρ on each vertex as the vertex weights. For the SIR model, each vertex is in one of the three states 'susceptible', 'infective' and 'removed'. An infective vertex infects a given susceptible neighbor at rate proportional to the production of the weights of these two vertices. An infective vertex becomes removed at a constant rate. A removed vertex will never be infected again. We assume that at t = 0 there is no removed vertex and the number of infective vertices follows a Bernoulli distribution B(n , θ) . Our main result is a law of large numbers of the model. We give two deterministic functions HS(ψt) ,HV(ψt) for t ≥ 0 and show that for any t ≥ 0, HS(ψt) is the limit proportion of susceptible vertices and HV(ψt) is the limit of the mean capability of an infective vertex to infect a given susceptible neighbor at moment t as n grows to infinity.
Calculating massive 3-loop graphs for operator matrix elements by the method of hyperlogarithms
NASA Astrophysics Data System (ADS)
Ablinger, Jakob; Blümlein, Johannes; Raab, Clemens; Schneider, Carsten; Wißbrock, Fabian
2014-08-01
We calculate convergent 3-loop Feynman diagrams containing a single massive loop equipped with twist τ=2 local operator insertions corresponding to spin N. They contribute to the massive operator matrix elements in QCD describing the massive Wilson coefficients for deep-inelastic scattering at large virtualities. Diagrams of this kind can be computed using an extended version of the method of hyperlogarithms, originally being designed for massless Feynman diagrams without operators. The method is applied to Benz- and V-type graphs, belonging to the genuine 3-loop topologies. In case of the V-type graphs with five massive propagators, new types of nested sums and iterated integrals emerge. The sums are given in terms of finite binomially and inverse binomially weighted generalized cyclotomic sums, while the 1-dimensionally iterated integrals are based on a set of ∼30 square-root valued letters. We also derive the asymptotic representations of the nested sums and present the solution for N∈C. Integrals with a power-like divergence in N-space ∝aN,a∈R,a>1, for large values of N emerge. They still possess a representation in x-space, which is given in terms of root-valued iterated integrals in the present case. The method of hyperlogarithms is also used to calculate higher moments for crossed box graphs with different operator insertions.
A VLSI decomposition of the deBruijn graph
NASA Technical Reports Server (NTRS)
Collins, O.; Dolinar, S.; Mceliece, R.; Pollara, F.
1990-01-01
A new Viterbi decoder for convolutional codes with constraint lengths up to 15, called the Big Viterbi Decoder, is under development for the Deep Space Network. It will be demonstrated by decoding data from the Galileo spacecraft, which has a rate 1/4, constraint-length 15 convolutional encoder on board. Here, the mathematical theory underlying the design of the very-large-scale-integrated (VLSI) chips that are being used to build this decoder is explained. The deBruijn graph B sub n describes the topology of a fully parallel, rate 1/v, constraint length n+2 Viterbi decoder, and it is shown that B sub n can be built by appropriately wiring together (i.e., connecting together with extra edges) many isomorphic copies of a fixed graph called a B sub n building block. The efficiency of such a building block is defined as the fraction of the edges in B sub n that are present in the copies of the building block. It is shown, among other things, that for any alpha less than 1, there exists a graph G which is a B sub n building block of efficiency greater than alpha for all sufficiently large n. These results are illustrated by describing a special hierarchical family of deBruijn building blocks, which has led to the design of the gate-array chips being used in the Big Viterbi Decoder.
Klados, Manousos A.; Kanatsouli, Kassia; Antoniou, Ioannis; Babiloni, Fabio; Tsirka, Vassiliki; Bamidis, Panagiotis D.; Micheloyannis, Sifis
2013-01-01
The two core systems of mathematical processing (subitizing and retrieval) as well as their functionality are already known and published. In this study we have used graph theory to compare the brain network organization of these two core systems in the cortical layer during difficult calculations. We have examined separately all the EEG frequency bands in healthy young individuals and we found that the network organization at rest, as well as during mathematical tasks has the characteristics of Small World Networks for all the bands, which is the optimum organization required for efficient information processing. The different mathematical stimuli provoked changes in the graph parameters of different frequency bands, especially the low frequency bands. More specific, in Delta band the induced network increases it’s local and global efficiency during the transition from subitizing to retrieval system, while results suggest that difficult mathematics provoke networks with higher cliquish organization due to more specific demands. The network of the Theta band follows the same pattern as before, having high nodal and remote organization during difficult mathematics. Also the spatial distribution of the network’s weights revealed more prominent connections in frontoparietal regions, revealing the working memory load due to the engagement of the retrieval system. The cortical networks of the alpha brainwaves were also more efficient, both locally and globally, during difficult mathematics, while the fact that alpha’s network was more dense on the frontparietal regions as well, reveals the engagement of the retrieval system again. Concluding, this study gives more evidences regarding the interaction of the two core systems, exploiting the produced functional networks of the cerebral cortex, especially for the difficult mathematics. PMID:23990992
PAGANI Toolkit: Parallel graph-theoretical analysis package for brain network big data.
Du, Haixiao; Xia, Mingrui; Zhao, Kang; Liao, Xuhong; Yang, Huazhong; Wang, Yu; He, Yong
2018-05-01
The recent collection of unprecedented quantities of neuroimaging data with high spatial resolution has led to brain network big data. However, a toolkit for fast and scalable computational solutions is still lacking. Here, we developed the PArallel Graph-theoretical ANalysIs (PAGANI) Toolkit based on a hybrid central processing unit-graphics processing unit (CPU-GPU) framework with a graphical user interface to facilitate the mapping and characterization of high-resolution brain networks. Specifically, the toolkit provides flexible parameters for users to customize computations of graph metrics in brain network analyses. As an empirical example, the PAGANI Toolkit was applied to individual voxel-based brain networks with ∼200,000 nodes that were derived from a resting-state fMRI dataset of 624 healthy young adults from the Human Connectome Project. Using a personal computer, this toolbox completed all computations in ∼27 h for one subject, which is markedly less than the 118 h required with a single-thread implementation. The voxel-based functional brain networks exhibited prominent small-world characteristics and densely connected hubs, which were mainly located in the medial and lateral fronto-parietal cortices. Moreover, the female group had significantly higher modularity and nodal betweenness centrality mainly in the medial/lateral fronto-parietal and occipital cortices than the male group. Significant correlations between the intelligence quotient and nodal metrics were also observed in several frontal regions. Collectively, the PAGANI Toolkit shows high computational performance and good scalability for analyzing connectome big data and provides a friendly interface without the complicated configuration of computing environments, thereby facilitating high-resolution connectomics research in health and disease. © 2018 Wiley Periodicals, Inc.
Klados, Manousos A; Kanatsouli, Kassia; Antoniou, Ioannis; Babiloni, Fabio; Tsirka, Vassiliki; Bamidis, Panagiotis D; Micheloyannis, Sifis
2013-01-01
The two core systems of mathematical processing (subitizing and retrieval) as well as their functionality are already known and published. In this study we have used graph theory to compare the brain network organization of these two core systems in the cortical layer during difficult calculations. We have examined separately all the EEG frequency bands in healthy young individuals and we found that the network organization at rest, as well as during mathematical tasks has the characteristics of Small World Networks for all the bands, which is the optimum organization required for efficient information processing. The different mathematical stimuli provoked changes in the graph parameters of different frequency bands, especially the low frequency bands. More specific, in Delta band the induced network increases it's local and global efficiency during the transition from subitizing to retrieval system, while results suggest that difficult mathematics provoke networks with higher cliquish organization due to more specific demands. The network of the Theta band follows the same pattern as before, having high nodal and remote organization during difficult mathematics. Also the spatial distribution of the network's weights revealed more prominent connections in frontoparietal regions, revealing the working memory load due to the engagement of the retrieval system. The cortical networks of the alpha brainwaves were also more efficient, both locally and globally, during difficult mathematics, while the fact that alpha's network was more dense on the frontparietal regions as well, reveals the engagement of the retrieval system again. Concluding, this study gives more evidences regarding the interaction of the two core systems, exploiting the produced functional networks of the cerebral cortex, especially for the difficult mathematics.
Dynamics of Nearest-Neighbour Competitions on Graphs
NASA Astrophysics Data System (ADS)
Rador, Tonguç
2017-10-01
Considering a collection of agents representing the vertices of a graph endowed with integer points, we study the asymptotic dynamics of the rate of the increase of their points according to a very simple rule: we randomly pick an an edge from the graph which unambiguously defines two agents we give a point the the agent with larger point with probability p and to the lagger with probability q such that p+q=1. The model we present is the most general version of the nearest-neighbour competition model introduced by Ben-Naim, Vazquez and Redner. We show that the model combines aspects of hyperbolic partial differential equations—as that of a conservation law—graph colouring and hyperplane arrangements. We discuss the properties of the model for general graphs but we confine in depth study to d-dimensional tori. We present a detailed study for the ring graph, which includes a chemical potential approximation to calculate all its statistics that gives rather accurate results. The two-dimensional torus, not studied in depth as the ring, is shown to possess critical behaviour in that the asymptotic speeds arrange themselves in two-coloured islands separated by borders of three other colours and the size of the islands obey power law distribution. We also show that in the large d limit the d-dimensional torus shows inverse sine law for the distribution of asymptotic speeds.
Functional Brain Networks Develop from a “Local to Distributed” Organization
Power, Jonathan D.; Dosenbach, Nico U. F.; Church, Jessica A.; Miezin, Francis M.; Schlaggar, Bradley L.; Petersen, Steven E.
2009-01-01
The mature human brain is organized into a collection of specialized functional networks that flexibly interact to support various cognitive functions. Studies of development often attempt to identify the organizing principles that guide the maturation of these functional networks. In this report, we combine resting state functional connectivity MRI (rs-fcMRI), graph analysis, community detection, and spring-embedding visualization techniques to analyze four separate networks defined in earlier studies. As we have previously reported, we find, across development, a trend toward ‘segregation’ (a general decrease in correlation strength) between regions close in anatomical space and ‘integration’ (an increased correlation strength) between selected regions distant in space. The generalization of these earlier trends across multiple networks suggests that this is a general developmental principle for changes in functional connectivity that would extend to large-scale graph theoretic analyses of large-scale brain networks. Communities in children are predominantly arranged by anatomical proximity, while communities in adults predominantly reflect functional relationships, as defined from adult fMRI studies. In sum, over development, the organization of multiple functional networks shifts from a local anatomical emphasis in children to a more “distributed” architecture in young adults. We argue that this “local to distributed” developmental characterization has important implications for understanding the development of neural systems underlying cognition. Further, graph metrics (e.g., clustering coefficients and average path lengths) are similar in child and adult graphs, with both showing “small-world”-like properties, while community detection by modularity optimization reveals stable communities within the graphs that are clearly different between young children and young adults. These observations suggest that early school age children and adults both have relatively efficient systems that may solve similar information processing problems in divergent ways. PMID:19412534
3D segmentation of lung CT data with graph-cuts: analysis of parameter sensitivities
NASA Astrophysics Data System (ADS)
Cha, Jung won; Dunlap, Neal; Wang, Brian; Amini, Amir
2016-03-01
Lung boundary image segmentation is important for many tasks including for example in development of radiation treatment plans for subjects with thoracic malignancies. In this paper, we describe a method and parameter settings for accurate 3D lung boundary segmentation based on graph-cuts from X-ray CT data1. Even though previously several researchers have used graph-cuts for image segmentation, to date, no systematic studies have been performed regarding the range of parameter that give accurate results. The energy function in the graph-cuts algorithm requires 3 suitable parameter settings: K, a large constant for assigning seed points, c, the similarity coefficient for n-links, and λ, the terminal coefficient for t-links. We analyzed the parameter sensitivity with four lung data sets from subjects with lung cancer using error metrics. Large values of K created artifacts on segmented images, and relatively much larger value of c than the value of λ influenced the balance between the boundary term and the data term in the energy function, leading to unacceptable segmentation results. For a range of parameter settings, we performed 3D image segmentation, and in each case compared the results with the expert-delineated lung boundaries. We used simple 6-neighborhood systems for n-link in 3D. The 3D image segmentation took 10 minutes for a 512x512x118 ~ 512x512x190 lung CT image volume. Our results indicate that the graph-cuts algorithm was more sensitive to the K and λ parameter settings than to the C parameter and furthermore that amongst the range of parameters tested, K=5 and λ=0.5 yielded good results.
Functional brain networks develop from a "local to distributed" organization.
Fair, Damien A; Cohen, Alexander L; Power, Jonathan D; Dosenbach, Nico U F; Church, Jessica A; Miezin, Francis M; Schlaggar, Bradley L; Petersen, Steven E
2009-05-01
The mature human brain is organized into a collection of specialized functional networks that flexibly interact to support various cognitive functions. Studies of development often attempt to identify the organizing principles that guide the maturation of these functional networks. In this report, we combine resting state functional connectivity MRI (rs-fcMRI), graph analysis, community detection, and spring-embedding visualization techniques to analyze four separate networks defined in earlier studies. As we have previously reported, we find, across development, a trend toward 'segregation' (a general decrease in correlation strength) between regions close in anatomical space and 'integration' (an increased correlation strength) between selected regions distant in space. The generalization of these earlier trends across multiple networks suggests that this is a general developmental principle for changes in functional connectivity that would extend to large-scale graph theoretic analyses of large-scale brain networks. Communities in children are predominantly arranged by anatomical proximity, while communities in adults predominantly reflect functional relationships, as defined from adult fMRI studies. In sum, over development, the organization of multiple functional networks shifts from a local anatomical emphasis in children to a more "distributed" architecture in young adults. We argue that this "local to distributed" developmental characterization has important implications for understanding the development of neural systems underlying cognition. Further, graph metrics (e.g., clustering coefficients and average path lengths) are similar in child and adult graphs, with both showing "small-world"-like properties, while community detection by modularity optimization reveals stable communities within the graphs that are clearly different between young children and young adults. These observations suggest that early school age children and adults both have relatively efficient systems that may solve similar information processing problems in divergent ways.
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
Visualizing multiattribute Web transactions using a freeze technique
NASA Astrophysics Data System (ADS)
Hao, Ming C.; Cotting, Daniel; Dayal, Umeshwar; Machiraju, Vijay; Garg, Pankaj
2003-05-01
Web transactions are multidimensional and have a number of attributes: client, URL, response times, and numbers of messages. One of the key questions is how to simultaneously lay out in a graph the multiple relationships, such as the relationships between the web client response times and URLs in a web access application. In this paper, we describe a freeze technique to enhance a physics-based visualization system for web transactions. The idea is to freeze one set of objects before laying out the next set of objects during the construction of the graph. As a result, we substantially reduce the force computation time. This technique consists of three steps: automated classification, a freeze operation, and a graph layout. These three steps are iterated until the final graph is generated. This iterated-freeze technique has been prototyped in several e-service applications at Hewlett Packard Laboratories. It has been used to visually analyze large volumes of service and sales transactions at online web sites.
Approximate ground states of the random-field Potts model from graph cuts
NASA Astrophysics Data System (ADS)
Kumar, Manoj; Kumar, Ravinder; Weigel, Martin; Banerjee, Varsha; Janke, Wolfhard; Puri, Sanjay
2018-05-01
While the ground-state problem for the random-field Ising model is polynomial, and can be solved using a number of well-known algorithms for maximum flow or graph cut, the analog random-field Potts model corresponds to a multiterminal flow problem that is known to be NP-hard. Hence an efficient exact algorithm is very unlikely to exist. As we show here, it is nevertheless possible to use an embedding of binary degrees of freedom into the Potts spins in combination with graph-cut methods to solve the corresponding ground-state problem approximately in polynomial time. We benchmark this heuristic algorithm using a set of quasiexact ground states found for small systems from long parallel tempering runs. For a not-too-large number q of Potts states, the method based on graph cuts finds the same solutions in a fraction of the time. We employ the new technique to analyze the breakup length of the random-field Potts model in two dimensions.
The genealogy of samples in models with selection.
Neuhauser, C; Krone, S M
1997-02-01
We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman's well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman's coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models. DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.
The Genealogy of Samples in Models with Selection
Neuhauser, C.; Krone, S. M.
1997-01-01
We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman's well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman's coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models, DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case. PMID:9071604
Graph Curvature for Differentiating Cancer Networks
Sandhu, Romeil; Georgiou, Tryphon; Reznik, Ed; Zhu, Liangjia; Kolesov, Ivan; Senbabaoglu, Yasin; Tannenbaum, Allen
2015-01-01
Cellular interactions can be modeled as complex dynamical systems represented by weighted graphs. The functionality of such networks, including measures of robustness, reliability, performance, and efficiency, are intrinsically tied to the topology and geometry of the underlying graph. Utilizing recently proposed geometric notions of curvature on weighted graphs, we investigate the features of gene co-expression networks derived from large-scale genomic studies of cancer. We find that the curvature of these networks reliably distinguishes between cancer and normal samples, with cancer networks exhibiting higher curvature than their normal counterparts. We establish a quantitative relationship between our findings and prior investigations of network entropy. Furthermore, we demonstrate how our approach yields additional, non-trivial pair-wise (i.e. gene-gene) interactions which may be disrupted in cancer samples. The mathematical formulation of our approach yields an exact solution to calculating pair-wise changes in curvature which was computationally infeasible using prior methods. As such, our findings lay the foundation for an analytical approach to studying complex biological networks. PMID:26169480
Melchert, O; Katzgraber, Helmut G; Novotny, M A
2016-04-01
We estimate the critical thresholds of bond and site percolation on nonplanar, effectively two-dimensional graphs with chimeralike topology. The building blocks of these graphs are complete and symmetric bipartite subgraphs of size 2n, referred to as K_{n,n} graphs. For the numerical simulations we use an efficient union-find-based algorithm and employ a finite-size scaling analysis to obtain the critical properties for both bond and site percolation. We report the respective percolation thresholds for different sizes of the bipartite subgraph and verify that the associated universality class is that of standard two-dimensional percolation. For the canonical chimera graph used in the D-Wave Systems Inc. quantum annealer (n=4), we discuss device failure in terms of network vulnerability, i.e., we determine the critical fraction of qubits and couplers that can be absent due to random failures prior to losing large-scale connectivity throughout the device.
Solving Partial Differential Equations in a data-driven multiprocessor environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.
1988-12-31
Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
Cascades in the Threshold Model for varying system sizes
NASA Astrophysics Data System (ADS)
Karampourniotis, Panagiotis; Sreenivasan, Sameet; Szymanski, Boleslaw; Korniss, Gyorgy
2015-03-01
A classical model in opinion dynamics is the Threshold Model (TM) aiming to model the spread of a new opinion based on the social drive of peer pressure. Under the TM a node adopts a new opinion only when the fraction of its first neighbors possessing that opinion exceeds a pre-assigned threshold. Cascades in the TM depend on multiple parameters, such as the number and selection strategy of the initially active nodes (initiators), and the threshold distribution of the nodes. For a uniform threshold in the network there is a critical fraction of initiators for which a transition from small to large cascades occurs, which for ER graphs is largerly independent of the system size. Here, we study the spread contribution of each newly assigned initiator under the TM for different initiator selection strategies for synthetic graphs of various sizes. We observe that for ER graphs when large cascades occur, the spread contribution of the added initiator on the transition point is independent of the system size, while the contribution of the rest of the initiators converges to zero at infinite system size. This property is used for the identification of large transitions for various threshold distributions. Supported in part by ARL NS-CTA, ARO, ONR, and DARPA.
Usability-driven pruning of large ontologies: the case of SNOMED CT.
López-García, Pablo; Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan
2012-06-01
To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Graph-traversal heuristics provided high coverage (71-96% of terms in the test sets of discharge summaries) at the expense of subset size (17-51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24-55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available.
Indexing Volumetric Shapes with Matching and Packing
Koes, David Ryan; Camacho, Carlos J.
2014-01-01
We describe a novel algorithm for bulk-loading an index with high-dimensional data and apply it to the problem of volumetric shape matching. Our matching and packing algorithm is a general approach for packing data according to a similarity metric. First an approximate k-nearest neighbor graph is constructed using vantage-point initialization, an improvement to previous work that decreases construction time while improving the quality of approximation. Then graph matching is iteratively performed to pack related items closely together. The end result is a dense index with good performance. We define a new query specification for shape matching that uses minimum and maximum shape constraints to explicitly specify the spatial requirements of the desired shape. This specification provides a natural language for performing volumetric shape matching and is readily supported by the geometry-based similarity search (GSS) tree, an indexing structure that maintains explicit representations of volumetric shape. We describe our implementation of a GSS tree for volumetric shape matching and provide a comprehensive evaluation of parameter sensitivity, performance, and scalability. Compared to previous bulk-loading algorithms, we find that matching and packing can construct a GSS-tree index in the same amount of time that is denser, flatter, and better performing, with an observed average performance improvement of 2X. PMID:26085707
Detection of protein complex from protein-protein interaction network using Markov clustering
NASA Astrophysics Data System (ADS)
Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.
2017-05-01
Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Clique-based data mining for related genes in a biomedical database.
Matsunaga, Tsutomu; Yonemori, Chikara; Tomita, Etsuji; Muramatsu, Masaaki
2009-07-01
Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.
Intelligent Data Visualization for Cross-Checking Spacecraft System Diagnosis
NASA Technical Reports Server (NTRS)
Ong, James C.; Remolina, Emilio; Breeden, David; Stroozas, Brett A.; Mohammed, John L.
2012-01-01
Any reasoning system is fallible, so crew members and flight controllers must be able to cross-check automated diagnoses of spacecraft or habitat problems by considering alternate diagnoses and analyzing related evidence. Cross-checking improves diagnostic accuracy because people can apply information processing heuristics, pattern recognition techniques, and reasoning methods that the automated diagnostic system may not possess. Over time, cross-checking also enables crew members to become comfortable with how the diagnostic reasoning system performs, so the system can earn the crew s trust. We developed intelligent data visualization software that helps users cross-check automated diagnoses of system faults more effectively. The user interface displays scrollable arrays of timelines and time-series graphs, which are tightly integrated with an interactive, color-coded system schematic to show important spatial-temporal data patterns. Signal processing and rule-based diagnostic reasoning automatically identify alternate hypotheses and data patterns that support or rebut the original and alternate diagnoses. A color-coded matrix display summarizes the supporting or rebutting evidence for each diagnosis, and a drill-down capability enables crew members to quickly view graphs and timelines of the underlying data. This system demonstrates that modest amounts of diagnostic reasoning, combined with interactive, information-dense data visualizations, can accelerate system diagnosis and cross-checking.
Multigraph: Reusable Interactive Data Graphs
NASA Astrophysics Data System (ADS)
Phillips, M. B.
2010-12-01
There are surprisingly few good software tools available for presenting time series data on the internet. The most common practice is to use a desktop program such as Excel or Matlab to save a graph as an image which can be included in a web page like any other image. This disconnects the graph from the data in a way that makes updating a graph with new data a cumbersome manual process, and it limits the user to one particular view of the data. The Multigraph project defines an XML format for describing interactive data graphs, and software tools for creating and rendering those graphs in web pages and other internet connected applications. Viewing a Multigraph graph is extremely simple and intuitive, and requires no instructions; the user can pan and zoom by clicking and dragging, in a familiar "Google Maps" kind of way. Creating a new graph for inclusion in a web page involves writing a simple XML configuration file. Multigraph can read data in a variety of formats, and can display data from a web service, allowing users to "surf" through large data sets, downloading only those the parts of the data that are needed for display. The Multigraph XML format, or "MUGL" for short, provides a concise description of the visual properties of a graph, such as axes, plot styles, data sources, labels, etc, as well as interactivity properties such as how and whether the user can pan or zoom along each axis. Multigraph reads a file in this format, draws the described graph, and allows the user to interact with it. Multigraph software currently includes a Flash application for embedding graphs in web pages, a Flex component for embedding graphs in larger Flex/Flash applications, and a plugin for creating graphs in the WordPress content management system. Plans for the future include a Java version for desktop viewing and editing, a command line version for batch and server side rendering, and possibly Android and iPhone versions. Multigraph is currently in use on several web sites including the US Drought Portal (www.drought.gov), the NOAA Climate Services Portal (www.climate.gov), the Climate Reference Network (www.ncdc.noaa.gov/crn), NCDC's State of the Climate Report (www.ncdc.noaa.gov/sotc), and the US Forest Service's Forest Change Assessment Viewer (ews.forestthreats.org/NPDE/NPDE.html). More information about Multigraph is available from the web site www.multigraph.org. Interactive Multigraph Display of Real Time Weather Data
Predicting and Detecting Emerging Cyberattack Patterns Using StreamWorks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Choudhury, Sutanay; Feo, John T.
2014-06-30
The number and sophistication of cyberattacks on industries and governments have dramatically grown in recent years. To counter this movement, new advanced tools and techniques are needed to detect cyberattacks in their early stages such that defensive actions may be taken to avert or mitigate potential damage. From a cybersecurity analysis perspective, detecting cyberattacks may be cast as a problem of identifying patterns in computer network traffic. Logically and intuitively, these patterns may take on the form of a directed graph that conveys how an attack or intrusion propagates through the computers of a network. Such cyberattack graphs could providemore » cybersecurity analysts with powerful conceptual representations that are natural to express and analyze. We have been researching and developing graph-centric approaches and algorithms for dynamic cyberattack detection. The advanced dynamic graph algorithms we are developing will be packaged into a streaming network analysis framework known as StreamWorks. With StreamWorks, a scientist or analyst may detect and identify precursor events and patterns as they emerge in complex networks. This analysis framework is intended to be used in a dynamic environment where network data is streamed in and is appended to a large-scale dynamic graph. Specific graphical query patterns are decomposed and collected into a graph query library. The individual decomposed subpatterns in the library are continuously and efficiently matched against the dynamic graph as it evolves to identify and detect early, partial subgraph patterns. The scalable emerging subgraph pattern algorithms will match on both structural and semantic network properties.« less
Fast Inbound Top-K Query for Random Walk with Restart.
Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei
2015-09-01
Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.
Dense module enumeration in biological networks
NASA Astrophysics Data System (ADS)
Tsuda, Koji; Georgii, Elisabeth
2009-12-01
Analysis of large networks is a central topic in various research fields including biology, sociology, and web mining. Detection of dense modules (a.k.a. clusters) is an important step to analyze the networks. Though numerous methods have been proposed to this aim, they often lack mathematical rigorousness. Namely, there is no guarantee that all dense modules are detected. Here, we present a novel reverse-search-based method for enumerating all dense modules. Furthermore, constraints from additional data sources such as gene expression profiles or customer profiles can be integrated, so that we can systematically detect dense modules with interesting profiles. We report successful applications in human protein interaction network analyses.
Understanding spatial connectivity of individuals with non-uniform population density.
Wang, Pu; González, Marta C
2009-08-28
We construct a two-dimensional geometric graph connecting individuals placed in space within a given contact distance. The individuals are distributed using a measured country's density of population. We observe that while large clusters (group of individuals connected) emerge within some regions, they are trapped in detached urban areas owing to the low population density of the regions bordering them. To understand the emergence of a giant cluster that connects the entire population, we compare the empirical geometric graph with the one generated by placing the same number of individuals randomly in space. We find that, for small contact distances, the empirical distribution of population dominates the growth of connected components, but no critical percolation transition is observed in contrast to the graph generated by a random distribution of population. Our results show that contact distances from real-world situations as for WIFI and Bluetooth connections drop in a zone where a fully connected cluster is not observed, hinting that human mobility must play a crucial role in contact-based diseases and wireless viruses' large-scale spreading.
A large-grain mapping approach for multiprocessor systems through data flow model. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Kim, Hwa-Soo
1991-01-01
A large-grain level mapping method is presented of numerical oriented applications onto multiprocessor systems. The method is based on the large-grain data flow representation of the input application and it assumes a general interconnection topology of the multiprocessor system. The large-grain data flow model was used because such representation best exhibits inherited parallelism in many important applications, e.g., CFD models based on partial differential equations can be presented in large-grain data flow format, very effectively. A generalized interconnection topology of the multiprocessor architecture is considered, including such architectural issues as interprocessor communication cost, with the aim to identify the 'best matching' between the application and the multiprocessor structure. The objective is to minimize the total execution time of the input algorithm running on the target system. The mapping strategy consists of the following: (1) large-grain data flow graph generation from the input application using compilation techniques; (2) data flow graph partitioning into basic computation blocks; and (3) physical mapping onto the target multiprocessor using a priority allocation scheme for the computation blocks.
RelFinder: Revealing Relationships in RDF Knowledge Bases
NASA Astrophysics Data System (ADS)
Heim, Philipp; Hellmann, Sebastian; Lehmann, Jens; Lohmann, Steffen; Stegemann, Timo
The Semantic Web has recently seen a rise of large knowledge bases (such as DBpedia) that are freely accessible via SPARQL endpoints. The structured representation of the contained information opens up new possibilities in the way it can be accessed and queried. In this paper, we present an approach that extracts a graph covering relationships between two objects of interest. We show an interactive visualization of this graph that supports the systematic analysis of the found relationships by providing highlighting, previewing, and filtering features.
Survey of gene splicing algorithms based on reads.
Si, Xiuhua; Wang, Qian; Zhang, Lei; Wu, Ruo; Ma, Jiquan
2017-11-02
Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.
Communication-Efficient Arbitration Models for Low-Resolution Data Flow Computing
1988-12-01
phase can be formally described as follows: Graph Partitioning Problem NP-complete: (Garey & Johnson) Given graph G = (V, E), weights w (v) for each v e V...Technical Report, MIT/LCS/TR-218, Cambridge, Mass. Agerwala, Tilak, February 1982, "Data Flow Systems", Computer, pp. 10-13. Babb, Robert G ., July 1984...34Parallel Processing with Large-Grain Data Flow Techniques," IEEE Computer 17, 7, pp. 55-61. Babb, Robert G ., II, Lise Storc, and William C. Ragsdale
Bayesian exponential random graph modelling of interhospital patient referral networks.
Caimo, Alberto; Pallotti, Francesca; Lomi, Alessandro
2017-08-15
Using original data that we have collected on referral relations between 110 hospitals serving a large regional community, we show how recently derived Bayesian exponential random graph models may be adopted to illuminate core empirical issues in research on relational coordination among healthcare organisations. We show how a rigorous Bayesian computation approach supports a fully probabilistic analytical framework that alleviates well-known problems in the estimation of model parameters of exponential random graph models. We also show how the main structural features of interhospital patient referral networks that prior studies have described can be reproduced with accuracy by specifying the system of local dependencies that produce - but at the same time are induced by - decentralised collaborative arrangements between hospitals. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Fault management for data systems
NASA Technical Reports Server (NTRS)
Boyd, Mark A.; Iverson, David L.; Patterson-Hine, F. Ann
1993-01-01
Issues related to automating the process of fault management (fault diagnosis and response) for data management systems are considered. Substantial benefits are to be gained by successful automation of this process, particularly for large, complex systems. The use of graph-based models to develop a computer assisted fault management system is advocated. The general problem is described and the motivation behind choosing graph-based models over other approaches for developing fault diagnosis computer programs is outlined. Some existing work in the area of graph-based fault diagnosis is reviewed, and a new fault management method which was developed from existing methods is offered. Our method is applied to an automatic telescope system intended as a prototype for future lunar telescope programs. Finally, an application of our method to general data management systems is described.
Corrected Mean-Field Model for Random Sequential Adsorption on Random Geometric Graphs
NASA Astrophysics Data System (ADS)
Dhara, Souvik; van Leeuwaarden, Johan S. H.; Mukherjee, Debankur
2018-03-01
A notorious problem in mathematics and physics is to create a solvable model for random sequential adsorption of non-overlapping congruent spheres in the d-dimensional Euclidean space with d≥ 2 . Spheres arrive sequentially at uniformly chosen locations in space and are accepted only when there is no overlap with previously deposited spheres. Due to spatial correlations, characterizing the fraction of accepted spheres remains largely intractable. We study this fraction by taking a novel approach that compares random sequential adsorption in Euclidean space to the nearest-neighbor blocking on a sequence of clustered random graphs. This random network model can be thought of as a corrected mean-field model for the interaction graph between the attempted spheres. Using functional limit theorems, we characterize the fraction of accepted spheres and its fluctuations.
Stracuzzi, David John; Brost, Randolph C.; Phillips, Cynthia A.; ...
2015-09-26
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. As a result, we present a preliminary evaluation of three methods for determining both match qualitymore » scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.« less
The baryon vector current in the combined chiral and 1/Nc expansions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Flores-Mendieta, Ruben; Goity, Jose L
2014-12-01
The baryon vector current is computed at one-loop order in large-Nc baryon chiral perturbation theory, where Nc is the number of colors. Loop graphs with octet and decuplet intermediate states are systematically incorporated into the analysis and the effects of the decuplet-octet mass difference and SU(3) flavor symmetry breaking are accounted for. There are large-Nc cancellations between different one-loop graphs as a consequence of the large-Nc spin-flavor symmetry of QCD baryons. The results are compared against the available experimental data through several fits in order to extract information about the unknown parameters. The large-Nc baryon chiral perturbation theory predictions aremore » in very good agreement both with the expectations from the 1/Nc expansion and with the experimental data. The effect of SU(3) flavor symmetry breaking for the |Delta S|=1 vector current form factors f1(0) results in a reduction by a few percent with respect to the corresponding SU(3) symmetric values.« less
Dynamics of dense granular flows of small-and-large-grain mixtures in an ambient fluid.
Meruane, C; Tamburrino, A; Roche, O
2012-08-01
Dense grain flows in nature consist of a mixture of solid constituents that are immersed in an ambient fluid. In order to obtain a good representation of these flows, the interaction mechanisms between the different constituents of the mixture should be considered. In this article, we study the dynamics of a dense granular flow composed of a binary mixture of small and large grains immersed in an ambient fluid. In this context, we extend the two-phase approach proposed by Meruane et al. [J. Fluid Mech. 648, 381 (2010)] to the case of flowing dense binary mixtures of solid particles, by including in the momentum equations a constitutive relation that describes the interaction mechanisms between the solid constituents in a dense regime. These coupled equations are solved numerically and validated by comparing the numerical results with experimental measurements of the front speed of gravitational granular flows resulting from the collapse, in ambient air or water, of two-dimensional granular columns that consisted of mixtures of small and large spherical particles of equal mass density. Our results suggest that the model equations include the essential features that describe the dynamics of grains flows of binary mixtures in an ambient fluid. In particular, it is shown that segregation of small and large grains can increase the front speed because of the volumetric expansion of the flow. This increase in flow speed is damped by the interaction forces with the ambient fluid, and this behavior is more pronounced in water than in air.
Chung, Dongjun; Kim, Hang J; Zhao, Hongyu
2017-02-01
Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
Learning of Multimodal Representations With Random Walks on the Click Graph.
Wu, Fei; Lu, Xinyan; Song, Jun; Yan, Shuicheng; Zhang, Zhongfei Mark; Rui, Yong; Zhuang, Yueting
2016-02-01
In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users' searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross-modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross-modal retrieval performance on the unseen queries/images than the other state-of-the-art methods.
Varoquaux, G; Gramfort, A; Poline, J B; Thirion, B
2012-01-01
Correlations in the signal observed via functional Magnetic Resonance Imaging (fMRI), are expected to reveal the interactions in the underlying neural populations through hemodynamic response. In particular, they highlight distributed set of mutually correlated regions that correspond to brain networks related to different cognitive functions. Yet graph-theoretical studies of neural connections give a different picture: that of a highly integrated system with small-world properties: local clustering but with short pathways across the complete structure. We examine the conditional independence properties of the fMRI signal, i.e. its Markov structure, to find realistic assumptions on the connectivity structure that are required to explain the observed functional connectivity. In particular we seek a decomposition of the Markov structure into segregated functional networks using decomposable graphs: a set of strongly-connected and partially overlapping cliques. We introduce a new method to efficiently extract such cliques on a large, strongly-connected graph. We compare methods learning different graph structures from functional connectivity by testing the goodness of fit of the model they learn on new data. We find that summarizing the structure as strongly-connected networks can give a good description only for very large and overlapping networks. These results highlight that Markov models are good tools to identify the structure of brain connectivity from fMRI signals, but for this purpose they must reflect the small-world properties of the underlying neural systems. Copyright © 2012 Elsevier Ltd. All rights reserved.
Building dynamic population graph for accurate correspondence detection.
Du, Shaoyi; Guo, Yanrong; Sanroma, Gerard; Ni, Dong; Wu, Guorong; Shen, Dinggang
2015-12-01
In medical imaging studies, there is an increasing trend for discovering the intrinsic anatomical difference across individual subjects in a dataset, such as hand images for skeletal bone age estimation. Pair-wise matching is often used to detect correspondences between each individual subject and a pre-selected model image with manually-placed landmarks. However, the large anatomical variability across individual subjects can easily compromise such pair-wise matching step. In this paper, we present a new framework to simultaneously detect correspondences among a population of individual subjects, by propagating all manually-placed landmarks from a small set of model images through a dynamically constructed image graph. Specifically, we first establish graph links between models and individual subjects according to pair-wise shape similarity (called as forward step). Next, we detect correspondences for the individual subjects with direct links to any of model images, which is achieved by a new multi-model correspondence detection approach based on our recently-published sparse point matching method. To correct those inaccurate correspondences, we further apply an error detection mechanism to automatically detect wrong correspondences and then update the image graph accordingly (called as backward step). After that, all subject images with detected correspondences are included into the set of model images, and the above two steps of graph expansion and error correction are repeated until accurate correspondences for all subject images are established. Evaluations on real hand X-ray images demonstrate that our proposed method using a dynamic graph construction approach can achieve much higher accuracy and robustness, when compared with the state-of-the-art pair-wise correspondence detection methods as well as a similar method but using static population graph. Copyright © 2015 Elsevier B.V. All rights reserved.
Analyzing and synthesizing phylogenies using tree alignment graphs.
Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E
2013-01-01
Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs
Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.
2013-01-01
Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118
NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques
2008-07-01
The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.
Resource utilization model for the algorithm to architecture mapping model
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Patel, Rakesh R.
1993-01-01
The analytical model for resource utilization and the variable node time and conditional node model for the enhanced ATAMM model for a real-time data flow architecture are presented in this research. The Algorithm To Architecture Mapping Model, ATAMM, is a Petri net based graph theoretic model developed at Old Dominion University, and is capable of modeling the execution of large-grained algorithms on a real-time data flow architecture. Using the resource utilization model, the resource envelope may be obtained directly from a given graph and, consequently, the maximum number of required resources may be evaluated. The node timing diagram for one iteration period may be obtained using the analytical resource envelope. The variable node time model, which describes the change in resource requirement for the execution of an algorithm under node time variation, is useful to expand the applicability of the ATAMM model to heterogeneous architectures. The model also describes a method of detecting the presence of resource limited mode and its subsequent prevention. Graphs with conditional nodes are shown to be reduced to equivalent graphs with time varying nodes and, subsequently, may be analyzed using the variable node time model to determine resource requirements. Case studies are performed on three graphs for the illustration of applicability of the analytical theories.
In-Memory Graph Databases for Web-Scale Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Morari, Alessandro; Weaver, Jesse R.
RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++more » compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.« less
Phase transitions in Ising models on directed networks
NASA Astrophysics Data System (ADS)
Lipowski, Adam; Ferreira, António Luis; Lipowska, Dorota; Gontarek, Krzysztof
2015-11-01
We examine Ising models with heat-bath dynamics on directed networks. Our simulations show that Ising models on directed triangular and simple cubic lattices undergo a phase transition that most likely belongs to the Ising universality class. On the directed square lattice the model remains paramagnetic at any positive temperature as already reported in some previous studies. We also examine random directed graphs and show that contrary to undirected ones, percolation of directed bonds does not guarantee ferromagnetic ordering. Only above a certain threshold can a random directed graph support finite-temperature ferromagnetic ordering. Such behavior is found also for out-homogeneous random graphs, but in this case the analysis of magnetic and percolative properties can be done exactly. Directed random graphs also differ from undirected ones with respect to zero-temperature freezing. Only at low connectivity do they remain trapped in a disordered configuration. Above a certain threshold, however, the zero-temperature dynamics quickly drives the model toward a broken symmetry (magnetized) state. Only above this threshold, which is almost twice as large as the percolation threshold, do we expect the Ising model to have a positive critical temperature. With a very good accuracy, the behavior on directed random graphs is reproduced within a certain approximate scheme.
Multiscale weighted colored graphs for protein flexibility and rigidity analysis
NASA Astrophysics Data System (ADS)
Bramer, David; Wei, Guo-Wei
2018-02-01
Protein structural fluctuation, measured by Debye-Waller factors or B-factors, is known to correlate to protein flexibility and function. A variety of methods has been developed for protein Debye-Waller factor prediction and related applications to domain separation, docking pose ranking, entropy calculation, hinge detection, stability analysis, etc. Nevertheless, none of the current methodologies are able to deliver an accuracy of 0.7 in terms of the Pearson correlation coefficients averaged over a large set of proteins. In this work, we introduce a paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis. Our MWCG model divides a protein graph into multiple subgraphs based on interaction types between graph nodes and represents the protein rigidity by generalized centralities of subgraphs. MWCGs not only predict the B-factors of protein residues but also accurately analyze the flexibility of all atoms in a protein. The MWCG model is validated over a number of protein test sets and compared with many standard methods. An extensive numerical study indicates that the proposed MWCG offers an accuracy of over 0.8 and thus provides perhaps the first reliable method for estimating protein flexibility and B-factors. It also simultaneously predicts all-atom flexibility in a molecule.
Incremental k-core decomposition: Algorithms and evaluation
Sariyuce, Ahmet Erdem; Gedik, Bugra; Jacques-SIlva, Gabriela; ...
2016-02-01
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that ismore » guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. Furthermore, for a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.« less
Community structure in networks
NASA Astrophysics Data System (ADS)
Newman, Mark
2004-03-01
Many networked systems, including physical, biological, social, and technological networks, appear to contain ``communities'' -- groups of nodes within which connections are dense, but between which they are sparser. The ability to find such communities in an automated fashion could be of considerable use. Communities in a web graph for instance might correspond to sets of web sites dealing with related topics, while communities in a biochemical network or an electronic circuit might correspond to functional units of some kind. We present a number of new methods for community discovery, including methods based on ``betweenness'' measures and methods based on modularity optimization. We also give examples of applications of these methods to both computer-generated and real-world network data, and show how our techniques can be used to shed light on the sometimes dauntingly complex structure of networked systems.
NASA Technical Reports Server (NTRS)
Araki, Suguru
1991-01-01
The kinetic theory of planetary rings developed by Araki and Tremaine (1986) and Araki (1988) is extended and refined, with a focus on the implications of finite particle size: (1) nonlocal collisions and (2) finite filling factors. Consideration is given to the derivation of the equations for the local steady state, the low-optical-depth limit, and the steady state at finite filling factors (including the effects of collision inelasticity, spin degrees of freedom, and self-gravity). Numerical results are presented in extensive graphs and characterized in detail. The importance of distinguishing effects (1) and (2) at low optical depths is stressed, and the existence of vertical density profiles with layered structures at high filling factors is demonstrated.
Epidemic spreading on complex networks with community structures
Stegehuis, Clara; van der Hofstad, Remco; van Leeuwaarden, Johan S. H.
2016-01-01
Many real-world networks display a community structure. We study two random graph models that create a network with similar community structure as a given network. One model preserves the exact community structure of the original network, while the other model only preserves the set of communities and the vertex degrees. These models show that community structure is an important determinant of the behavior of percolation processes on networks, such as information diffusion or virus spreading: the community structure can both enforce as well as inhibit diffusion processes. Our models further show that it is the mesoscopic set of communities that matters. The exact internal structures of communities barely influence the behavior of percolation processes across networks. This insensitivity is likely due to the relative denseness of the communities. PMID:27440176
Lei, Chunyang; Bie, Hongxia; Fang, Gengfa; Gaura, Elena; Brusey, James; Zhang, Xuekun; Dutkiewicz, Eryk
2016-07-18
Super dense wireless sensor networks (WSNs) have become popular with the development of Internet of Things (IoT), Machine-to-Machine (M2M) communications and Vehicular-to-Vehicular (V2V) networks. While highly-dense wireless networks provide efficient and sustainable solutions to collect precise environmental information, a new channel access scheme is needed to solve the channel collision problem caused by the large number of competing nodes accessing the channel simultaneously. In this paper, we propose a space-time random access method based on a directional data transmission strategy, by which collisions in the wireless channel are significantly decreased and channel utility efficiency is greatly enhanced. Simulation results show that our proposed method can decrease the packet loss rate to less than 2 % in large scale WSNs and in comparison with other channel access schemes for WSNs, the average network throughput can be doubled.
Colak, Recep; Moser, Flavia; Chu, Jeffrey Shih-Chieh; Schönhuth, Alexander; Chen, Nansheng; Ester, Martin
2010-10-25
Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented. We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples. We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets. Software and data sets are available at http://www.sfu.ca/~ester/software/DECOB.zip.
Path Network Recovery Using Remote Sensing Data and Geospatial-Temporal Semantic Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
William C. McLendon III; Brost, Randy C.
Remote sensing systems produce large volumes of high-resolution images that are difficult to search. The GeoGraphy (pronounced Geo-Graph-y) framework [2, 20] encodes remote sensing imagery into a geospatial-temporal semantic graph representation to enable high level semantic searches to be performed. Typically scene objects such as buildings and trees tend to be shaped like blocks with few holes, but other shapes generated from path networks tend to have a large number of holes and can span a large geographic region due to their connectedness. For example, we have a dataset covering the city of Philadelphia in which there is a singlemore » road network node spanning a 6 mile x 8 mile region. Even a simple question such as "find two houses near the same street" might give unexpected results. More generally, nodes arising from networks of paths (roads, sidewalks, trails, etc.) require additional processing to make them useful for searches in GeoGraphy. We have assigned the term Path Network Recovery to this process. Path Network Recovery is a three-step process involving (1) partitioning the network node into segments, (2) repairing broken path segments interrupted by occlusions or sensor noise, and (3) adding path-aware search semantics into GeoQuestions. This report covers the path network recovery process, how it is used, and some example use cases of the current capabilities.« less
Usability-driven pruning of large ontologies: the case of SNOMED CT
Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan
2012-01-01
Objectives To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. Materials and Methods Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. Results Graph-traversal heuristics provided high coverage (71–96% of terms in the test sets of discharge summaries) at the expense of subset size (17–51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24–55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. Discussion Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. Conclusion Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available. PMID:22268217
Interacting particle systems on graphs
NASA Astrophysics Data System (ADS)
Sood, Vishal
In this dissertation, the dynamics of socially or biologically interacting populations are investigated. The individual members of the population are treated as particles that interact via links on a social or biological network represented as a graph. The effect of the structure of the graph on the properties of the interacting particle system is studied using statistical physics techniques. In the first chapter, the central concepts of graph theory and social and biological networks are presented. Next, interacting particle systems that are drawn from physics, mathematics and biology are discussed in the second chapter. In the third chapter, the random walk on a graph is studied. The mean time for a random walk to traverse between two arbitrary sites of a random graph is evaluated. Using an effective medium approximation it is found that the mean first-passage time between pairs of sites, as well as all moments of this first-passage time, are insensitive to the density of links in the graph. The inverse of the mean-first passage time varies non-monotonically with the density of links near the percolation transition of the random graph. Much of the behavior can be understood by simple heuristic arguments. Evolutionary dynamics, by which mutants overspread an otherwise uniform population on heterogeneous graphs, are studied in the fourth chapter. Such a process underlies' epidemic propagation, emergence of fads, social cooperation or invasion of an ecological niche by a new species. The first part of this chapter is devoted to neutral dynamics, in which the mutant genotype does not have a selective advantage over the resident genotype. The time to extinction of one of the two genotypes is derived. In the second part of this chapter, selective advantage or fitness is introduced such that the mutant genotype has a higher birth rate or a lower death rate. This selective advantage leads to a dynamical competition in which selection dominates for large populations, while for small populations the dynamics are similar to the neutral case. The likelihood for the fitter mutants to drive the resident genotype to extinction is calculated.
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems. PMID:25143977
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.
Craig, Hugh; Berretta, Regina; Moscato, Pablo
2016-01-01
In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416
Contact Graph Routing Enhancements Developed in ION for DTN
NASA Technical Reports Server (NTRS)
Segui, John S.; Burleigh, Scott
2013-01-01
The Interplanetary Overlay Network (ION) software suite is an open-source, flight-ready implementation of networking protocols including the Delay/Disruption Tolerant Networking (DTN) Bundle Protocol (BP), the CCSDS (Consultative Committee for Space Data Systems) File Delivery Protocol (CFDP), and many others including the Contact Graph Routing (CGR) DTN routing system. While DTN offers the capability to tolerate disruption and long signal propagation delays in transmission, without an appropriate routing protocol, no data can be delivered. CGR was built for space exploration networks with scheduled communication opportunities (typically based on trajectories and orbits), represented as a contact graph. Since CGR uses knowledge of future connectivity, the contact graph can grow rather large, and so efficient processing is desired. These enhancements allow CGR to scale to predicted NASA space network complexities and beyond. This software improves upon CGR by adopting an earliest-arrival-time cost metric and using the Dijkstra path selection algorithm. Moving to Dijkstra path selection also enables construction of an earliest- arrival-time tree for multicast routing. The enhancements have been rolled into ION 3.0 available on sourceforge.net.
Functional connectivity and graph theory in preclinical Alzheimer's disease.
Brier, Matthew R; Thomas, Jewell B; Fagan, Anne M; Hassenstab, Jason; Holtzman, David M; Benzinger, Tammie L; Morris, John C; Ances, Beau M
2014-04-01
Alzheimer's disease (AD) has a long preclinical phase in which amyloid and tau cerebral pathology accumulate without producing cognitive symptoms. Resting state functional connectivity magnetic resonance imaging has demonstrated that brain networks degrade during symptomatic AD. It is unclear to what extent these degradations exist before symptomatic onset. In this study, we investigated graph theory metrics of functional integration (path length), functional segregation (clustering coefficient), and functional distinctness (modularity) as a function of disease severity. Further, we assessed whether these graph metrics were affected in cognitively normal participants with cerebrospinal fluid evidence of preclinical AD. Clustering coefficient and modularity, but not path length, were reduced in AD. Cognitively normal participants who harbored AD biomarker pathology also showed reduced values in these graph measures, demonstrating brain changes similar to, but smaller than, symptomatic AD. Only modularity was significantly affected by age. We also demonstrate that AD has a particular effect on hub-like regions in the brain. We conclude that AD causes large-scale disconnection that is present before onset of symptoms. Copyright © 2014 Elsevier Inc. All rights reserved.
Tensor Spectral Clustering for Partitioning Higher-order Network Structures.
Benson, Austin R; Gleich, David F; Leskovec, Jure
2015-01-01
Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.
Tensor Spectral Clustering for Partitioning Higher-order Network Structures
Benson, Austin R.; Gleich, David F.; Leskovec, Jure
2016-01-01
Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms. PMID:27812399
Rorres, Chris; Romano, Maria; Miller, Jennifer A; Mossey, Jana M; Grubesic, Tony H; Zellner, David E; Smith, Gary
2018-06-01
Contact tracing is a crucial component of the control of many infectious diseases, but is an arduous and time consuming process. Procedures that increase the efficiency of contact tracing increase the chance that effective controls can be implemented sooner and thus reduce the magnitude of the epidemic. We illustrate a procedure using Graph Theory in the context of infectious disease epidemics of farmed animals in which the epidemics are driven mainly by the shipment of animals between farms. Specifically, we created a directed graph of the recorded shipments of deer between deer farms in Pennsylvania over a timeframe and asked how the properties of the graph could be exploited to make contact tracing more efficient should Chronic Wasting Disease (a prion disease of deer) be discovered in one of the farms. We show that the presence of a large strongly connected component in the graph has a significant impact on the number of contacts that can arise. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Visual traffic jam analysis based on trajectory data.
Wang, Zuchao; Lu, Min; Yuan, Xiaoru; Zhang, Junping; van de Wetering, Huub
2013-12-01
In this work, we present an interactive system for visual analysis of urban traffic congestion based on GPS trajectories. For these trajectories we develop strategies to extract and derive traffic jam information. After cleaning the trajectories, they are matched to a road network. Subsequently, traffic speed on each road segment is computed and traffic jam events are automatically detected. Spatially and temporally related events are concatenated in, so-called, traffic jam propagation graphs. These graphs form a high-level description of a traffic jam and its propagation in time and space. Our system provides multiple views for visually exploring and analyzing the traffic condition of a large city as a whole, on the level of propagation graphs, and on road segment level. Case studies with 24 days of taxi GPS trajectories collected in Beijing demonstrate the effectiveness of our system.
Fast determination of structurally cohesive subgroups in large networks
Sinkovits, Robert S.; Moody, James; Oztan, B. Tolga; White, Douglas R.
2016-01-01
Structurally cohesive subgroups are a powerful and mathematically rigorous way to characterize network robustness. Their strength lies in the ability to detect strong connections among vertices that not only have no neighbors in common, but that may be distantly separated in the graph. Unfortunately, identifying cohesive subgroups is a computationally intensive problem, which has limited empirical assessments of cohesion to relatively small graphs of at most a few thousand vertices. We describe here an approach that exploits the properties of cliques, k-cores and vertex separators to iteratively reduce the complexity of the graph to the point where standard algorithms can be used to complete the analysis. As a proof of principle, we apply our method to the cohesion analysis of a 29,462-vertex biconnected component extracted from a 128,151-vertex co-authorship data set. PMID:28503215
Skeletal Mechanism Generation of Surrogate Jet Fuels for Aeropropulsion Modeling
NASA Astrophysics Data System (ADS)
Sung, Chih-Jen; Niemeyer, Kyle E.
2010-05-01
A novel implementation for the skeletal reduction of large detailed reaction mechanisms using the directed relation graph with error propagation and sensitivity analysis (DRGEPSA) is developed and presented with skeletal reductions of two important hydrocarbon components, n-heptane and n-decane, relevant to surrogate jet fuel development. DRGEPSA integrates two previously developed methods, directed relation graph-aided sensitivity analysis (DRGASA) and directed relation graph with error propagation (DRGEP), by first applying DRGEP to efficiently remove many unimportant species prior to sensitivity analysis to further remove unimportant species, producing an optimally small skeletal mechanism for a given error limit. It is illustrated that the combination of the DRGEP and DRGASA methods allows the DRGEPSA approach to overcome the weaknesses of each previous method, specifically that DRGEP cannot identify all unimportant species and that DRGASA shields unimportant species from removal.
Pathview: an R/Bioconductor package for pathway-based data integration and visualization.
Luo, Weijun; Brouwer, Cory
2013-07-15
Pathview is a novel tool set for pathway-based data integration and visualization. It maps and renders user data on relevant pathway graphs. Users only need to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps and integrates user data onto the pathway and renders pathway graphs with the mapped data. Although built as a stand-alone program, Pathview may seamlessly integrate with pathway and functional analysis tools for large-scale and fully automated analysis pipelines. The package is freely available under the GPLv3 license through Bioconductor and R-Forge. It is available at http://bioconductor.org/packages/release/bioc/html/pathview.html and at http://Pathview.r-forge.r-project.org/. luo_weijun@yahoo.com Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Evolutionary dynamics of social dilemmas in structured heterogeneous populations.
Santos, F C; Pacheco, J M; Lenaerts, Tom
2006-02-28
Real populations have been shown to be heterogeneous, in which some individuals have many more contacts than others. This fact contrasts with the traditional homogeneous setting used in studies of evolutionary game dynamics. We incorporate heterogeneity in the population by studying games on graphs, in which the variability in connectivity ranges from single-scale graphs, for which heterogeneity is small and associated degree distributions exhibit a Gaussian tale, to scale-free graphs, for which heterogeneity is large with degree distributions exhibiting a power-law behavior. We study the evolution of cooperation, modeled in terms of the most popular dilemmas of cooperation. We show that, for all dilemmas, increasing heterogeneity favors the emergence of cooperation, such that long-term cooperative behavior easily resists short-term noncooperative behavior. Moreover, we show how cooperation depends on the intricate ties between individuals in scale-free populations.
Modeling and optimum time performance for concurrent processing
NASA Technical Reports Server (NTRS)
Mielke, Roland R.; Stoughton, John W.; Som, Sukhamoy
1988-01-01
The development of a new graph theoretic model for describing the relation between a decomposed algorithm and its execution in a data flow environment is presented. Called ATAMM, the model consists of a set of Petri net marked graphs useful for representing decision-free algorithms having large-grained, computationally complex primitive operations. Performance time measures which determine computing speed and throughput capacity are defined, and the ATAMM model is used to develop lower bounds for these times. A concurrent processing operating strategy for achieving optimum time performance is presented and illustrated by example.
Performance simulation for the design of solar heating and cooling systems
NASA Technical Reports Server (NTRS)
Mccormick, P. O.
1975-01-01
Suitable approaches for evaluating the performance and the cost of a solar heating and cooling system are considered, taking into account the value of a computer simulation concerning the entire system in connection with the large number of parameters involved. Operational relations concerning the collector efficiency in the case of a new improved collector and a reference collector are presented in a graph. Total costs for solar and conventional heating, ventilation, and air conditioning systems as a function of time are shown in another graph.
Phase-locked patterns of the Kuramoto model on 3-regular graphs
NASA Astrophysics Data System (ADS)
DeVille, Lee; Ermentrout, Bard
2016-09-01
We consider the existence of non-synchronized fixed points to the Kuramoto model defined on sparse networks: specifically, networks where each vertex has degree exactly three. We show that "most" such networks support multiple attracting phase-locked solutions that are not synchronized and study the depth and width of the basins of attraction of these phase-locked solutions. We also show that it is common in "large enough" graphs to find phase-locked solutions where one or more of the links have angle difference greater than π/2.
Phase-locked patterns of the Kuramoto model on 3-regular graphs.
DeVille, Lee; Ermentrout, Bard
2016-09-01
We consider the existence of non-synchronized fixed points to the Kuramoto model defined on sparse networks: specifically, networks where each vertex has degree exactly three. We show that "most" such networks support multiple attracting phase-locked solutions that are not synchronized and study the depth and width of the basins of attraction of these phase-locked solutions. We also show that it is common in "large enough" graphs to find phase-locked solutions where one or more of the links have angle difference greater than π/2.
Large fluctuations in anti-coordination games on scale-free graphs
NASA Astrophysics Data System (ADS)
Sabsovich, Daniel; Mobilia, Mauro; Assaf, Michael
2017-05-01
We study the influence of the complex topology of scale-free graphs on the dynamics of anti-coordination games (e.g. snowdrift games). These reference models are characterized by the coexistence (evolutionary stable mixed strategy) of two competing species, say ‘cooperators’ and ‘defectors’, and, in finite systems, by metastability and large-fluctuation-driven fixation. In this work, we use extensive computer simulations and an effective diffusion approximation (in the weak selection limit) to determine under which circumstances, depending on the individual-based update rules, the topology drastically affects the long-time behavior of anti-coordination games. In particular, we compute the variance of the number of cooperators in the metastable state and the mean fixation time when the dynamics is implemented according to the voter model (death-first/birth-second process) and the link dynamics (birth/death or death/birth at random). For the voter update rule, we show that the scale-free topology effectively renormalizes the population size and as a result the statistics of observables depend on the network’s degree distribution. In contrast, such a renormalization does not occur with the link dynamics update rule and we recover the same behavior as on complete graphs.
Aligning Biomolecular Networks Using Modular Graph Kernels
NASA Astrophysics Data System (ADS)
Towfic, Fadi; Greenlee, M. Heather West; Honavar, Vasant
Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. We explore a class of algorithms for aligning large biomolecular networks by breaking down such networks into subgraphs and computing the alignment of the networks based on the alignment of their subgraphs. The resulting subnetworks are compared using graph kernels as scoring functions. We provide implementations of the resulting algorithms as part of BiNA, an open source biomolecular network alignment toolkit. Our experiments using Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus and Homo sapiens protein-protein interaction networks extracted from the DIP repository of protein-protein interaction data demonstrate that the performance of the proposed algorithms (as measured by % GO term enrichment of subnetworks identified by the alignment) is competitive with some of the state-of-the-art algorithms for pair-wise alignment of large protein-protein interaction networks. Our results also show that the inter-species similarity scores computed based on graph kernels can be used to cluster the species into a species tree that is consistent with the known phylogenetic relationships among the species.
NASA Astrophysics Data System (ADS)
Khunjua, T. G.; Klimenko, K. G.; Zhokhov, R. N.
2018-03-01
In this paper the phase structure of dense quark matter has been investigated at zero temperature in the presence of baryon, isospin and chiral isospin chemical potentials in the framework of massless (3 +1 )-dimensional Nambu-Jona-Lasinio model with two quark flavors. It has been shown that in the large-Nc limit (Nc is the number of colors of quarks) there exists a duality correspondence between the chiral symmetry breaking phase and the charged pion condensation one. The key conclusion of our studies is the fact that chiral isospin chemical potential generates charged pion condensation in dense quark matter with isotopic asymmetry.
Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M
2013-01-01
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
Big Data Analytics with Datalog Queries on Spark.
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2016-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.
Big Data Analytics with Datalog Queries on Spark
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2017-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296
Large-eddy simulation of dense gas dispersion over a simplified urban area
NASA Astrophysics Data System (ADS)
Wingstedt, E. M. M.; Osnes, A. N.; Åkervik, E.; Eriksson, D.; Reif, B. A. Pettersson
2017-03-01
Dispersion of neutral and dense gas over a simplified urban area, comprising four cubes, has been investigated by the means of large-eddy simulations (LES). The results have been compared to wind tunnel experiments and both mean and fluctuating quantities of velocity and concentration are in very good agreement. High-quality inflow profiles are necessary to achieve physically realistic LES results. In this study, profiles matching the atmospheric boundary layer flow in the wind tunnel, are generated by means of a separate precursor simulation. Emission of dense gas dramatically alters the flow in the near source region and introduces an upstream dispersion. The resulting dispersion patterns of neutral and dense gas differ significantly, where the plume in the latter case is wider and shallower. The dense gas is highly affected by the cube array, which seems to act as a barrier, effectively deflecting the plume. This leads to higher concentrations outside of the array than inside. On the contrary, the neutral gas plume has a Gaussian-type shape, with highest concentrations along the centreline. It is found that the dense gas reduces the vertical and spanwise turbulent momentum transport and, as a consequence, the turbulence kinetic energy. The reduction coincides with the area where the gradient Richardson number exceeds its critical value, i.e. where the flow may be characterized as stably stratified. Interestingly, this region does not correspond to where the concentration of dense gas is the highest (close to the ground), as this is also where the largest velocity gradients are to be found. Instead there is a layer in the middle of the dense gas cloud where buoyancy is dynamically dominant.
Possible Climate Change Influences on Continued Reduction of Dense Fog in Southern California
NASA Astrophysics Data System (ADS)
Ladochy, S.; Witiw, M.
2010-07-01
Dense fog appears to be decreasing in many parts of the world, especially in cities. An earlier study showed that dense fog (visibility < 400 m) was disappearing in the urban southern California area as well. Using hourly data from 1948 to the present, we looked at the relationship between fog events and contributing factors in the region along with trends over time. We showed that the decrease in dense fog events could be explained mainly by declining particulate levels, Pacific SSTs, and increased urban warming. Dense fog is most prevalent along the coast and decreases rapidly inland, so the influence of the Pacific should be large. In particular, the Pacific Decadal Oscillation (PDO) and the Southern Oscillation signals can be seen in fog frequencies as well as in the contributing factors. Results show a decrease in the occurrence of dense fog at two airports in close proximity to the Pacific Ocean, LAX and LGB. Occurrence of the frequency of low visibilities at these two locations was highly correlated with the phases of the PDO. While examining data from LAX, we saw a frequency of dense fog that reached over 300 hours in 1950, but occurrence was down to zero in 1997. Since 1997, there has been a bit of a recovery with both 2008 and 2009 recording over 30 hours of dense fog each. In the present study, we continue to examine the relationships that control the frequency of dense fog in coastal southern California. To remove urban influence, we also included Vandenberg Air Force Base, located in a relatively sparsely populated area. While particulates, urban heat island and Pacific SSTs are all contributing factors, we now speculate on the direct and indirect influences of climate change on continued decreases in dense fog. Case studies of local and regional dense fog in southern California point to the importance of strong, low inversions and to a lesser contributor, Santa Ana winds. Both are associated with large-scale atmospheric circulation patterns, which have changed markedly over the period of study.
Dense-body aggregates as plastic structures supporting tension in smooth muscle cells.
Zhang, Jie; Herrera, Ana M; Paré, Peter D; Seow, Chun Y
2010-11-01
The wall of hollow organs of vertebrates is a unique structure able to generate active tension and maintain a nearly constant passive stiffness over a large volume range. These properties are predominantly attributable to the smooth muscle cells that line the organ wall. Although smooth muscle is known to possess plasticity (i.e., the ability to adapt to large changes in cell length through structural remodeling of contractile apparatus and cytoskeleton), the detailed structural basis for the plasticity is largely unknown. Dense bodies, one of the most prominent structures in smooth muscle cells, have been regarded as the anchoring sites for actin filaments, similar to the Z-disks in striated muscle. Here, we show that the dense bodies and intermediate filaments formed cable-like structures inside airway smooth muscle cells and were able to adjust the cable length according to cell length and tension. Stretching the muscle cell bundle in the relaxed state caused the cables to straighten, indicating that these intracellular structures were connected to the extracellular matrix and could support passive tension. These plastic structures may be responsible for the ability of smooth muscle to maintain a nearly constant tensile stiffness over a large length range. The finding suggests that the structural plasticity of hollow organs may originate from the dense-body cables within the smooth muscle cells.
Osawa, Yoko; Fujita, Kazuhiko; Umezawa, Yu; Kayanne, Hajime; Ide, Yoichi; Nagaoka, Tatsutoshi; Miyajima, Toshihiro; Yamano, Hiroya
2010-08-01
Human impacts on sand-producing, large benthic foraminifers were investigated on ocean reef flats at the northeast Majuro Atoll, Marshall Islands, along a human population gradient. The densities of dominant foraminifers Calcarina and Amphistegina declined with distance from densely populated islands. Macrophyte composition on ocean reef flats differed between locations near sparsely or densely populated islands. Nutrient concentrations in reef-flat seawater and groundwater were high near or on densely populated islands. delta(15)N values in macroalgal tissues indicated that macroalgae in nearshore lagoons assimilate wastewater-derived nitrogen, whereas those on nearshore ocean reef flats assimilate nitrogen from other sources. These results suggest that increases in the human population result in high nutrient loading in groundwater and possibly into nearshore waters. High nutrient inputs into ambient seawater may have both direct and indirect negative effects on sand-producing foraminifers through habitat changes and/or the collapse of algal symbiosis. Copyright 2010 Elsevier Ltd. All rights reserved.
Design and Evaluation of Perceptual-based Object Group Selection Techniques
NASA Astrophysics Data System (ADS)
Dehmeshki, Hoda
Selecting groups of objects is a frequent task in graphical user interfaces. It is required prior to many standard operations such as deletion, movement, or modification. Conventional selection techniques are lasso, rectangle selection, and the selection and de-selection of items through the use of modifier keys. These techniques may become time-consuming and error-prone when target objects are densely distributed or when the distances between target objects are large. Perceptual-based selection techniques can considerably improve selection tasks when targets have a perceptual structure, for example when arranged along a line. Current methods to detect such groups use ad hoc grouping algorithms that are not based on results from perception science. Moreover, these techniques do not allow selecting groups with arbitrary arrangements or permit modifying a selection. This dissertation presents two domain-independent perceptual-based systems that address these issues. Based on established group detection models from perception research, the proposed systems detect perceptual groups formed by the Gestalt principles of good continuation and proximity. The new systems provide gesture-based or click-based interaction techniques for selecting groups with curvilinear or arbitrary structures as well as clusters. Moreover, the gesture-based system is adapted for the graph domain to facilitate path selection. This dissertation includes several user studies that show the proposed systems outperform conventional selection techniques when targets form salient perceptual groups and are still competitive when targets are semi-structured.
Improvement of Automated POST Case Success Rate Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Zwack, Matthew R.; Dees, Patrick D.
2017-01-01
During early conceptual design of complex systems, concept down selection can have a large impact upon program life-cycle cost. Therefore, any concepts selected during early design will inherently commit program costs and affect the overall probability of program success. For this reason it is important to consider as large a design space as possible in order to better inform the down selection process. For conceptual design of launch vehicles, trajectory analysis and optimization often presents the largest obstacle to evaluating large trade spaces. This is due to the sensitivity of the trajectory discipline to changes in all other aspects of the vehicle design. Small deltas in the performance of other subsystems can result in relatively large fluctuations in the ascent trajectory because the solution space is non-linear and multi-modal [1]. In order to help capture large design spaces for new launch vehicles, the authors have performed previous work seeking to automate the execution of the industry standard tool, Program to Optimize Simulated Trajectories (POST). This work initially focused on implementation of analyst heuristics to enable closure of cases in an automated fashion, with the goal of applying the concepts of design of experiments (DOE) and surrogate modeling to enable near instantaneous throughput of vehicle cases [2]. Additional work was then completed to improve the DOE process by utilizing a graph theory based approach to connect similar design points [3]. The conclusion of the previous work illustrated the utility of the graph theory approach for completing a DOE through POST. However, this approach was still dependent upon the use of random repetitions to generate seed points for the graph. As noted in [3], only 8% of these random repetitions resulted in converged trajectories. This ultimately affects the ability of the random reps method to confidently approach the global optima for a given vehicle case in a reasonable amount of time. With only an 8% pass rate, tens or hundreds of thousands of reps may be needed to be confident that the best repetition is at least close to the global optima. However, typical design study time constraints require that fewer repetitions be attempted, sometimes resulting in seed points that have only a handful of successful completions. If a small number of successful repetitions are used to generate a seed point, the graph method may inherit some inaccuracies as it chains DOE cases from the non-global-optimal seed points. This creates inherent noise in the graph data, which can limit the accuracy of the resulting surrogate models. For this reason, the goal of this work is to improve the seed point generation method and ultimately the accuracy of the resulting POST surrogate model. The work focuses on increasing the case pass rate for seed point generation.
Networks of genetic loci and the scientific literature
NASA Astrophysics Data System (ADS)
Semeiks, J. R.; Grate, L. R.; Mian, I. S.
This work considers biological information graphs, networks in which nodes corre-spond to genetic loci (or "genes") and an (undirected) edge signifies that two genes are discussed in the same article(s) in the scientific literature ("documents"). Operations that utilize the topology of these graphs can assist researchers in the scientific discovery process. For example, a shortest path between two nodes defines an ordered series of genes and documents that can be used to explore the relationship(s) between genes of interest. This work (i) describes how topologies in which edges are likely to reflect genuine relationship(s) can be constructed from human-curated corpora of genes an-notated with documents (or vice versa), and (ii) illustrates the potential of biological information graphs in synthesizing knowledge in order to formulate new hypotheses and generate novel predictions for subsequent experimental study. In particular, the well-known LocusLink corpus is used to construct a biological information graph consisting of 10,297 nodes and 21,910 edges. The large-scale statistical properties of this gene-document network suggest that it is a new example of a power-law network. The segregation of genes on the basis of species and encoded protein molecular function indicate the presence of assortativity, the preference for nodes with similar attributes to be neighbors in a network. The practical utility of a gene-document network is illustrated by using measures such as shortest paths and centrality to analyze a subset of nodes corresponding to genes implicated in aging. Each release of a curated biomedical corpus defines a particular static graph. The topology of a gene-document network changes over time as curators add and/or remove nodes and/or edges. Such a dynamic, evolving corpus provides both the foundation for analyzing the growth and behavior of large complex networks and a substrate for examining trends in biological research.
Spatio-Semantic Comparison of Large 3d City Models in Citygml Using a Graph Database
NASA Astrophysics Data System (ADS)
Nguyen, S. H.; Yao, Z.; Kolbe, T. H.
2017-10-01
A city may have multiple CityGML documents recorded at different times or surveyed by different users. To analyse the city's evolution over a given period of time, as well as to update or edit the city model without negating modifications made by other users, it is of utmost importance to first compare, detect and locate spatio-semantic changes between CityGML datasets. This is however difficult due to the fact that CityGML elements belong to a complex hierarchical structure containing multi-level deep associations, which can basically be considered as a graph. Moreover, CityGML allows multiple syntactic ways to define an object leading to syntactic ambiguities in the exchange format. Furthermore, CityGML is capable of including not only 3D urban objects' graphical appearances but also their semantic properties. Since to date, no known algorithm is capable of detecting spatio-semantic changes in CityGML documents, a frequent approach is to replace the older models completely with the newer ones, which not only costs computational resources, but also loses track of collaborative and chronological changes. Thus, this research proposes an approach capable of comparing two arbitrarily large-sized CityGML documents on both semantic and geometric level. Detected deviations are then attached to their respective sources and can easily be retrieved on demand. As a result, updating a 3D city model using this approach is much more efficient as only real changes are committed. To achieve this, the research employs a graph database as the main data structure for storing and processing CityGML datasets in three major steps: mapping, matching and updating. The mapping process transforms input CityGML documents into respective graph representations. The matching process compares these graphs and attaches edit operations on the fly. Found changes can then be executed using the Web Feature Service (WFS), the standard interface for updating geographical features across the web.
Efficient Synthesis of Graph Methods: a Dynamically Scheduled Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
RDF databases naturally map to a graph representation and employ languages, such as SPARQL, that implements queries as graph pattern matching routines. Graph methods exhibit an irregular behavior: they present unpredictable, fine-grained data accesses, and are synchronization inten- sive. Graph data structures expose large amounts of dy- namic parallelism, but are difficult to partition without gen- erating load unbalance. In this paper, we present a novel ar- chitecture to improve the synthesis of graph methods. Our design addresses the issues of these algorithms with two com- ponents: a Dynamic Task Scheduler (DTS), which reduces load unbalance and maximize resource utilization,more » and a Hi- erarchical Memory Interface controller (HMI), which pro- vides support for concurrent memory operations on multi- ported/multi-banked shared memories. We evaluate our ap- proach by generating the accelerators for a set of SPARQL queries from the Lehigh University Benchmark (LUBM). We first analyze the load unbalance of these queries, showing that execution time among tasks can differ even of order of magnitudes. We then synthesize the queries and com- pare the performance of the resulting accelerators against the current state of the art. Experimental results show that our solution provides a speedup over the serial implementa- tion close to the theoretical maximum and a speedup up to 3.45 over a baseline parallel implementation. We conclude our study by exploring the design space to achieve maximum memory channels utilization. The best design used at least three of the four memory channels for more than 90% of the execution time.« less
Small-world bias of correlation networks: From brain to climate
NASA Astrophysics Data System (ADS)
Hlinka, Jaroslav; Hartman, David; Jajcay, Nikola; Tomeček, David; Tintěra, Jaroslav; Paluš, Milan
2017-03-01
Complex systems are commonly characterized by the properties of their graph representation. Dynamical complex systems are then typically represented by a graph of temporal dependencies between time series of state variables of their subunits. It has been shown recently that graphs constructed in this way tend to have relatively clustered structure, potentially leading to spurious detection of small-world properties even in the case of systems with no or randomly distributed true interactions. However, the strength of this bias depends heavily on a range of parameters and its relevance for real-world data has not yet been established. In this work, we assess the relevance of the bias using two examples of multivariate time series recorded in natural complex systems. The first is the time series of local brain activity as measured by functional magnetic resonance imaging in resting healthy human subjects, and the second is the time series of average monthly surface air temperature coming from a large reanalysis of climatological data over the period 1948-2012. In both cases, the clustering in the thresholded correlation graph is substantially higher compared with a realization of a density-matched random graph, while the shortest paths are relatively short, showing thus distinguishing features of small-world structure. However, comparable or even stronger small-world properties were reproduced in correlation graphs of model processes with randomly scrambled interconnections. This suggests that the small-world properties of the correlation matrices of these real-world systems indeed do not reflect genuinely the properties of the underlying interaction structure, but rather result from the inherent properties of correlation matrix.
Small-world bias of correlation networks: From brain to climate.
Hlinka, Jaroslav; Hartman, David; Jajcay, Nikola; Tomeček, David; Tintěra, Jaroslav; Paluš, Milan
2017-03-01
Complex systems are commonly characterized by the properties of their graph representation. Dynamical complex systems are then typically represented by a graph of temporal dependencies between time series of state variables of their subunits. It has been shown recently that graphs constructed in this way tend to have relatively clustered structure, potentially leading to spurious detection of small-world properties even in the case of systems with no or randomly distributed true interactions. However, the strength of this bias depends heavily on a range of parameters and its relevance for real-world data has not yet been established. In this work, we assess the relevance of the bias using two examples of multivariate time series recorded in natural complex systems. The first is the time series of local brain activity as measured by functional magnetic resonance imaging in resting healthy human subjects, and the second is the time series of average monthly surface air temperature coming from a large reanalysis of climatological data over the period 1948-2012. In both cases, the clustering in the thresholded correlation graph is substantially higher compared with a realization of a density-matched random graph, while the shortest paths are relatively short, showing thus distinguishing features of small-world structure. However, comparable or even stronger small-world properties were reproduced in correlation graphs of model processes with randomly scrambled interconnections. This suggests that the small-world properties of the correlation matrices of these real-world systems indeed do not reflect genuinely the properties of the underlying interaction structure, but rather result from the inherent properties of correlation matrix.
Drawing road networks with focus regions.
Haunert, Jan-Henrik; Sering, Leon
2011-12-01
Mobile users of maps typically need detailed information about their surroundings plus some context information about remote places. In order to avoid that the map partly gets too dense, cartographers have designed mapping functions that enlarge a user-defined focus region--such functions are sometimes called fish-eye projections. The extra map space occupied by the enlarged focus region is compensated by distorting other parts of the map. We argue that, in a map showing a network of roads relevant to the user, distortion should preferably take place in those areas where the network is sparse. Therefore, we do not apply a predefined mapping function. Instead, we consider the road network as a graph whose edges are the road segments. We compute a new spatial mapping with a graph-based optimization approach, minimizing the square sum of distortions at edges. Our optimization method is based on a convex quadratic program (CQP); CQPs can be solved in polynomial time. Important requirements on the output map are expressed as linear inequalities. In particular, we show how to forbid edge crossings. We have implemented our method in a prototype tool. For instances of different sizes, our method generated output maps that were far less distorted than those generated with a predefined fish-eye projection. Future work is needed to automate the selection of roads relevant to the user. Furthermore, we aim at fast heuristics for application in real-time systems. © 2011 IEEE
Completing the gaps in Kilauea's Father's Day InSAR displacement signature with ScanSAR
NASA Astrophysics Data System (ADS)
Bertran Ortiz, A.; Pepe, A.; Lanari, R.; Lundgren, P.; Rosen, P. A.
2009-12-01
Currently there are gaps in the known displacement signature obtained with InSAR at Kilauea between 2002 and 2009. InSAR data can be richer than GPS because of denser spatial cover. However, to better model rapidly varying and non-steady geophysical events InSAR is limited because of its less dense time observations of the area under study. The ScanSAR mode currently available in several satellites mitigates this effect because the satellite may illuminate a given area more than once within an orbit cycle. The Kilauea displacement graph below from Instituto per Il Rilevamento Electromagnetico dell'Ambiente (IREA) is a cut in space of the displacement signature obtained from a time series of several stripmap-to-stripmap interferograms. It shows that critical information is missing, especially between 2006 and 2007. The displacement is expected to be non-linear judging from the 2007-2008 displacement signature, thus simple interpolation would not suffice. The gap can be filled by incorporating Envisat stripmap-to-ScanSAR interferograms available during that time period. We propose leveraging JPL's new ROI-PAC ScanSAR module to create stripmap-to-ScanSAR interferograms. The new interferograms will be added to the stripmap ones in order to extend the existing stripmap time series generated by using the Small BAseline Subset (SBAS) technique. At AGU we will present denser graphs that better capture Kilauea's displacement between 2003 and 2009.
Do Circumnuclear Dense Gas Disks Drive Mass Accretion onto Supermassive Black Holes?
NASA Astrophysics Data System (ADS)
Izumi, Takuma; Kawakatu, Nozomu; Kohno, Kotaro
2016-08-01
We present a positive correlation between the mass of dense molecular gas ({M}{{dense}}) of ˜100 pc scale circumnuclear disks (CNDs) and the black hole mass accretion rate ({\\dot{M}}{{BH}}) in a total of 10 Seyfert galaxies, based on data compiled from the literature and an archive (median aperture θ med = 220 pc). A typical {M}{{dense}} of CNDs is 107-8 {M}⊙ , estimated from the luminosity of the dense gas tracer, the HCN(1-0) emission line. Because dense molecular gas is the site of star formation, this correlation is virtually equivalent to the one between the nuclear star-formation rate and {\\dot{M}}{{BH}} revealed previously. Moreover, the {M}{{dense}}{--}{\\dot{M}}{{BH}} correlation was tighter for CND-scale gas than for the gas on kiloparsec or larger scales. This indicates that CNDs likely play an important role in fueling black holes, whereas greater than kiloparesec scale gas does not. To demonstrate a possible approach for studying the CND-scale accretion process with the Atacama Large Millimeter/submillimeter Array, we used a mass accretion model where angular momentum loss due to supernova explosions is vital. Based on the model prediction, we suggest that only the partial fraction of the mass accreted from the CND ({\\dot{M}}{{acc}}) is consumed as {\\dot{M}}{{BH}}. However, {\\dot{M}}{{acc}} agrees well with the total nuclear mass flow rate (I.e., {\\dot{M}}{{BH}} + outflow rate). Although these results are still tentative with large uncertainties, they support the view that star formation in CNDs can drive mass accretion onto supermassive black holes in Seyfert galaxies.
The minimal-ABC trees with B1-branches.
Dimitrov, Darko; Du, Zhibin; Fonseca, Carlos M da
2018-01-01
The atom-bond connectivity index (or, for short, ABC index) is a molecular structure descriptor bridging chemistry to graph theory. It is probably the most studied topological index among all numerical parameters of a graph that characterize its topology. For a given graph G = (V, E), the ABC index of G is defined as [Formula: see text], where di denotes the degree of the vertex i, and ij is the edge incident to the vertices i and j. A combination of physicochemical and the ABC index properties are commonly used to foresee the bioactivity of different chemical composites. Additionally, the applicability of the ABC index in chemical thermodynamics and other areas of chemistry, such as in dendrimer nanostars, benzenoid systems, fluoranthene congeners, and phenylenes is well studied in the literature. While finding of the graphs with the greatest ABC-value is a straightforward assignment, the characterization of the tree(s) with minimal ABC index is a problem largely open and has recently given rise to numerous studies and conjectures. A B1-branch of a graph is a pendent path of order 2. In this paper, we provide an important step forward to the full characterization of these minimal trees. Namely, we show that a minimal-ABC tree contains neither 4 nor 3 B1-branches. The case when the number of B1-branches is 2 is also considered.
A Graph-Algorithmic Approach for the Study of Metastability in Markov Chains
NASA Astrophysics Data System (ADS)
Gan, Tingyue; Cameron, Maria
2017-06-01
Large continuous-time Markov chains with exponentially small transition rates arise in modeling complex systems in physics, chemistry, and biology. We propose a constructive graph-algorithmic approach to determine the sequence of critical timescales at which the qualitative behavior of a given Markov chain changes, and give an effective description of the dynamics on each of them. This approach is valid for both time-reversible and time-irreversible Markov processes, with or without symmetry. Central to this approach are two graph algorithms, Algorithm 1 and Algorithm 2, for obtaining the sequences of the critical timescales and the hierarchies of Typical Transition Graphs or T-graphs indicating the most likely transitions in the system without and with symmetry, respectively. The sequence of critical timescales includes the subsequence of the reciprocals of the real parts of eigenvalues. Under a certain assumption, we prove sharp asymptotic estimates for eigenvalues (including pre-factors) and show how one can extract them from the output of Algorithm 1. We discuss the relationship between Algorithms 1 and 2 and explain how one needs to interpret the output of Algorithm 1 if it is applied in the case with symmetry instead of Algorithm 2. Finally, we analyze an example motivated by R. D. Astumian's model of the dynamics of kinesin, a molecular motor, by means of Algorithm 2.
Runaway electrons as a source of impurity and reduced fusion yield in the dense plasma focus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lerner, Eric J.; Yousefi, Hamid R.
2014-10-15
Impurities produced by the vaporization of metals in the electrodes may be a major cause of reduced fusion yields in high-current dense plasma focus devices. We propose here that a major, but hitherto-overlooked, cause of such impurities is vaporization by runaway electrons during the breakdown process at the beginning of the current pulse. This process is sufficient to account for the large amount of erosion observed in many dense plasma focus devices on the anode very near to the insulator. The erosion is expected to become worse with lower pressures, typical of machines with large electrode radii, and would explainmore » the plateauing of fusion yield observed in such machines at higher peak currents. Such runaway electron vaporization can be eliminated by the proper choice of electrode material, by reducing electrode radii and thus increasing fill gas pressure, or by using pre-ionization to eliminate the large fields that create runaway electrons. If these steps are combined with monolithic electrodes to eliminate arcing erosion, large reductions in impurities and large increases in fusion yield may be obtained, as the I{sup 4} scaling is extended to higher currents.« less
Laser Heating in a Dense Plasma Focus.
The report is divided in two parts. In the first part an account is given of the measurement of the momentum distribution of the deuterons ejected from a dense plasma focus . The results show the existence of a pronounced non-Maxwellian distribution and a small population of deuterons accelerated to the voltage of the condenser bank. In the second part theoretical calculation of laser heating establish the presence of large density gradient which probably accounts for the large currents detected in such plasmas. (Author)
Large-Scale Constraint-Based Pattern Mining
ERIC Educational Resources Information Center
Zhu, Feida
2009-01-01
We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…
Human connectome module pattern detection using a new multi-graph MinMax cut model.
De, Wang; Wang, Yang; Nie, Feiping; Yan, Jingwen; Cai, Weidong; Saykin, Andrew J; Shen, Li; Huang, Heng
2014-01-01
Many recent scientific efforts have been devoted to constructing the human connectome using Diffusion Tensor Imaging (DTI) data for understanding the large-scale brain networks that underlie higher-level cognition in human. However, suitable computational network analysis tools are still lacking in human connectome research. To address this problem, we propose a novel multi-graph min-max cut model to detect the consistent network modules from the brain connectivity networks of all studied subjects. A new multi-graph MinMax cut model is introduced to solve this challenging computational neuroscience problem and the efficient optimization algorithm is derived. In the identified connectome module patterns, each network module shows similar connectivity patterns in all subjects, which potentially associate to specific brain functions shared by all subjects. We validate our method by analyzing the weighted fiber connectivity networks. The promising empirical results demonstrate the effectiveness of our method.
Influence analysis of Github repositories.
Hu, Yan; Zhang, Jun; Bai, Xiaomei; Yu, Shuo; Yang, Zhuo
2016-01-01
With the support of cloud computing techniques, social coding platforms have changed the style of software development. Github is now the most popular social coding platform and project hosting service. Software developers of various levels keep entering Github, and use Github to save their public and private software projects. The large amounts of software developers and software repositories on Github are posing new challenges to the world of software engineering. This paper tries to tackle one of the important problems: analyzing the importance and influence of Github repositories. We proposed a HITS based influence analysis on graphs that represent the star relationship between Github users and repositories. A weighted version of HITS is applied to the overall star graph, and generates a different set of top influential repositories other than the results from standard version of HITS algorithm. We also conduct the influential analysis on per-month star graph, and study the monthly influence ranking of top repositories.
Semantic super networks: A case analysis of Wikipedia papers
NASA Astrophysics Data System (ADS)
Kostyuchenko, Evgeny; Lebedeva, Taisiya; Goritov, Alexander
2017-11-01
An algorithm for constructing super-large semantic networks has been developed in current work. Algorithm was tested using the "Cosmos" category of the Internet encyclopedia "Wikipedia" as an example. During the implementation, a parser for the syntax analysis of Wikipedia pages was developed. A graph based on list of articles and categories was formed. On the basis of the obtained graph analysis, algorithms for finding domains of high connectivity in a graph were proposed and tested. Algorithms for constructing a domain based on the number of links and the number of articles in the current subject area is considered. The shortcomings of these algorithms are shown and explained, an algorithm is developed on their joint use. The possibility of applying a combined algorithm for obtaining the final domain is shown. The problem of instability of the received domain was discovered when starting an algorithm from two neighboring vertices related to the domain.
NASA Astrophysics Data System (ADS)
Shi, Xizhi; He, Chaoyu; Pickard, Chris J.; Tang, Chao; Zhong, Jianxin
2018-01-01
A method is introduced to stochastically generate crystal structures with defined structural characteristics. Reasonable quotient graphs for symmetric crystals are constructed using a random strategy combined with space group and graph theory. Our algorithm enables the search for large-size and complex crystal structures with a specified connectivity, such as threefold sp2 carbons, fourfold sp3 carbons, as well as mixed sp2-sp3 carbons. To demonstrate the method, we randomly construct initial structures adhering to space groups from 75 to 230 and a range of lattice constants, and we identify 281 new sp3 carbon crystals. First-principles optimization of these structures show that most of them are dynamically and mechanically stable and are energetically comparable to those previously proposed. Some of the new structures can be considered as candidates to explain the experimental cold compression of graphite.
Brain white matter fiber estimation and tractography using Q-ball imaging and Bayesian MODEL.
Lu, Meng
2015-01-01
Diffusion tensor imaging allows for the non-invasive in vivo mapping of the brain tractography. However, fiber bundles have complex structures such as fiber crossings, fiber branchings and fibers with large curvatures that tensor imaging (DTI) cannot accurately handle. This study presents a novel brain white matter tractography method using Q-ball imaging as the data source instead of DTI, because QBI can provide accurate information about multiple fiber crossings and branchings in a single voxel using an orientation distribution function (ODF). The presented method also uses graph theory to construct the Bayesian model-based graph, so that the fiber tracking between two voxels can be represented as the shortest path in a graph. Our experiment showed that our new method can accurately handle brain white matter fiber crossings and branchings, and reconstruct brain tractograhpy both in phantom data and real brain data.
NASA Astrophysics Data System (ADS)
Aleksanyan, Grayr; Shcherbakov, Ivan; Kucher, Artem; Sulyz, Andrew
2018-04-01
Continuous monitoring of the patient's breathing by the method of multi-angle electric impedance tomography allows to obtain images of conduction change in the chest cavity during the monitoring. Direct analysis of images is difficult due to the large amount of information and low resolution images obtained by multi-angle electrical impedance tomography. This work presents a method for obtaining a graph of respiratory activity of the lungs based on the results of continuous lung monitoring using the multi-angle electrical impedance tomography method. The method makes it possible to obtain a graph of the respiratory activity of the left and right lungs separately, as well as a summary graph, to which it is possible to apply methods of processing the results of spirography.
Social inertia and diversity in collaboration networks
NASA Astrophysics Data System (ADS)
Ramasco, J. J.
2007-04-01
Random graphs are useful tools to study social interactions. In particular, the use of weighted random graphs allows to handle a high level of information concerning which agents interact and in which degree the interactions take place. Taking advantage of this representation, we recently defined a magnitude, the Social Inertia, that measures the eagerness of agents to keep ties with previous partners. To study this magnitude, we used collaboration networks that are specially appropriate to obtain valid statitical results due to the large size of publically available databases. In this work, I study the Social Inertia in two of these empirical networks, IMDB movie database and condmat. More specifically, I focus on how the Inertia relates to other properties of the graphs, and show that the Inertia provides information on how the weight of neighboring edges correlates. A social interpretation of this effect is also offered.
Fast graph-based relaxed clustering for large data sets using minimal enclosing ball.
Qian, Pengjiang; Chung, Fu-Lai; Wang, Shitong; Deng, Zhaohong
2012-06-01
Although graph-based relaxed clustering (GRC) is one of the spectral clustering algorithms with straightforwardness and self-adaptability, it is sensitive to the parameters of the adopted similarity measure and also has high time complexity O(N(3)) which severely weakens its usefulness for large data sets. In order to overcome these shortcomings, after introducing certain constraints for GRC, an enhanced version of GRC [constrained GRC (CGRC)] is proposed to increase the robustness of GRC to the parameters of the adopted similarity measure, and accordingly, a novel algorithm called fast GRC (FGRC) based on CGRC is developed in this paper by using the core-set-based minimal enclosing ball approximation. A distinctive advantage of FGRC is that its asymptotic time complexity is linear with the data set size N. At the same time, FGRC also inherits the straightforwardness and self-adaptability from GRC, making the proposed FGRC a fast and effective clustering algorithm for large data sets. The advantages of FGRC are validated by various benchmarking and real data sets.
NASA Technical Reports Server (NTRS)
Cardelli, Jason A.; Clayton, Geoffrey C.
1991-01-01
The range of validity of the average absolute extinction law (AAEL) proposed by Cardelli et al. (1988 and 1989) is investigated, combining published visible and NIR data with IUE UV observations for three lines of sight through dense dark cloud environments with high values of total-to-selective extinction. The characteristics of the data sets and the reduction and parameterization methods applied are described in detail, and the results are presented in extensive tables and graphs. Good agreement with the AAEL is demonstrated for wavelengths from 3.4 microns to 250 nm, but significant deviations are found at shorter wavelengths (where previous studies of lines of sight through bright nebulosity found good agreement with the AAEL). These differences are attributed to the effects of coatings on small-bump and FUV grains.
Experimental damage detection of wind turbine blade using thin film sensor array
NASA Astrophysics Data System (ADS)
Downey, Austin; Laflamme, Simon; Ubertini, Filippo; Sarkar, Partha
2017-04-01
Damage detection of wind turbine blades is difficult due to their large sizes and complex geometries. Additionally, economic restraints limit the viability of high-cost monitoring methods. While it is possible to monitor certain global signatures through modal analysis, obtaining useful measurements over a blade's surface using off-the-shelf sensing technologies is difficult and typically not economical. A solution is to deploy dedicated sensor networks fabricated from inexpensive materials and electronics. The authors have recently developed a novel large-area electronic sensor measuring strain over very large surfaces. The sensing system is analogous to a biological skin, where local strain can be monitored over a global area. In this paper, we propose the utilization of a hybrid dense sensor network of soft elastomeric capacitors to detect, localize, and quantify damage, and resistive strain gauges to augment such dense sensor network with high accuracy data at key locations. The proposed hybrid dense sensor network is installed inside a wind turbine blade model and tested in a wind tunnel to simulate an operational environment. Damage in the form of changing boundary conditions is introduced into the monitored section of the blade. Results demonstrate the ability of the hybrid dense sensor network, and associated algorithms, to detect, localize, and quantify damage.
A Weighted Configuration Model and Inhomogeneous Epidemics
NASA Astrophysics Data System (ADS)
Britton, Tom; Deijfen, Maria; Liljeros, Fredrik
2011-12-01
A random graph model with prescribed degree distribution and degree dependent edge weights is introduced. Each vertex is independently equipped with a random number of half-edges and each half-edge is assigned an integer valued weight according to a distribution that is allowed to depend on the degree of its vertex. Half-edges with the same weight are then paired randomly to create edges. An expression for the threshold for the appearance of a giant component in the resulting graph is derived using results on multi-type branching processes. The same technique also gives an expression for the basic reproduction number for an epidemic on the graph where the probability that a certain edge is used for transmission is a function of the edge weight (reflecting how closely `connected' the corresponding vertices are). It is demonstrated that, if vertices with large degree tend to have large (small) weights on their edges and if the transmission probability increases with the edge weight, then it is easier (harder) for the epidemic to take off compared to a randomized epidemic with the same degree and weight distribution. A recipe for calculating the probability of a large outbreak in the epidemic and the size of such an outbreak is also given. Finally, the model is fitted to three empirical weighted networks of importance for the spread of contagious diseases and it is shown that R 0 can be substantially over- or underestimated if the correlation between degree and weight is not taken into account.
An Ancestral Recombination Graph for Diploid Populations with Skewed Offspring Distribution
Birkner, Matthias; Blath, Jochen; Eldon, Bjarki
2013-01-01
A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar “heavily skewed” reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between different population models. PMID:23150600
An Improved Heuristic Method for Subgraph Isomorphism Problem
NASA Astrophysics Data System (ADS)
Xiang, Yingzhuo; Han, Jiesi; Xu, Haijiang; Guo, Xin
2017-09-01
This paper focus on the subgraph isomorphism (SI) problem. We present an improved genetic algorithm, a heuristic method to search the optimal solution. The contribution of this paper is that we design a dedicated crossover algorithm and a new fitness function to measure the evolution process. Experiments show our improved genetic algorithm performs better than other heuristic methods. For a large graph, such as a subgraph of 40 nodes, our algorithm outperforms the traditional tree search algorithms. We find that the performance of our improved genetic algorithm does not decrease as the number of nodes in prototype graphs.
Magic bases, metric ansaetze and generalized graph theories in the Virasoro master equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halpern, M.B.; Obers, N.A.
1991-11-15
The authors define a class of magic Lie group bases in which the Virasoro master equation admits a class of simple metric ansaetze (g{sub metric}), whose structure is visible in the high-level expansion. When a magic basis is real on compact g, the corresponding g{sub metric} is a large system of unitary, generically irrational conformal field theories. Examples in this class include the graph-theory ansatz SO(n){sub diag} in the Cartesian basis of So(n) and the ansatz SU(n){sub metric} in the Pauli-like basis of SU(n). A new phenomenon is observed in the high-level comparison of SU(n){sub metric}: Due to the trigonometricmore » structure constants of the Pauli-like basis, irrational central charge is clearly visible at finite order of the expansion. They also define the sine-area graphs of SU(n), which label the conformal field theories of SU(n){sub metric} and note that, in a similar fashion, each magic basis of g defines a generalize graph theory on g which labels the conformal field theories of g{sub metric}.« less
Adaptation of pancreatic islet cyto-architecture during development
NASA Astrophysics Data System (ADS)
Striegel, Deborah A.; Hara, Manami; Periwal, Vipul
2016-04-01
Plasma glucose in mammals is regulated by hormones secreted by the islets of Langerhans embedded in the exocrine pancreas. Islets consist of endocrine cells, primarily α, β, and δ cells, which secrete glucagon, insulin, and somatostatin, respectively. β cells form irregular locally connected clusters within islets that act in concert to secrete insulin upon glucose stimulation. Varying demands and available nutrients during development produce changes in the local connectivity of β cells in an islet. We showed in earlier work that graph theory provides a framework for the quantification of the seemingly stochastic cyto-architecture of β cells in an islet. To quantify the dynamics of endocrine connectivity during development requires a framework for characterizing changes in the probability distribution on the space of possible graphs, essentially a Fokker-Planck formalism on graphs. With large-scale imaging data for hundreds of thousands of islets containing millions of cells from human specimens, we show that this dynamics can be determined quantitatively. Requiring that rearrangement and cell addition processes match the observed dynamic developmental changes in quantitative topological graph characteristics strongly constrained possible processes. Our results suggest that there is a transient shift in preferred connectivity for β cells between 1-35 weeks and 12-24 months.
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...
2016-09-21
In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less
What can graph theory tell us about word learning and lexical retrieval?
Vitevitch, Michael S
2008-04-01
Graph theory and the new science of networks provide a mathematically rigorous approach to examine the development and organization of complex systems. These tools were applied to the mental lexicon to examine the organization of words in the lexicon and to explore how that structure might influence the acquisition and retrieval of phonological word-forms. Pajek, a program for large network analysis and visualization (V. Batagelj & A. Mvrar, 1998), was used to examine several characteristics of a network derived from a computerized database of the adult lexicon. Nodes in the network represented words, and a link connected two nodes if the words were phonological neighbors. The average path length and clustering coefficient suggest that the phonological network exhibits small-world characteristics. The degree distribution was fit better by an exponential rather than a power-law function. Finally, the network exhibited assortative mixing by degree. Some of these structural characteristics were also found in graphs that were formed by 2 simple stochastic processes suggesting that similar processes might influence the development of the lexicon. The graph theoretic perspective may provide novel insights about the mental lexicon and lead to future studies that help us better understand language development and processing.
NASA Astrophysics Data System (ADS)
Mailloux, B. J.; Kenna, T. C.
2008-12-01
The creation and accurate interpretation of graphs is becoming a lost art among students. The availability of numerous graphing software programs makes the act of graphing data easy but does not necessarily aide students in interpreting complex visual data. This is especially true for contour maps; which have become a critical skill in the earth sciences and everyday life. In multiple classes, we have incorporated a large-scale, hands-on, contouring exercise of temperature, salinity, and density data collected in the Hudson River Estuary. The exercise allows students to learn first-hand how to plot, analyze, and present three dimensional data. As part of a day-long sampling expedition aboard an 80' research vessel, students deploy a water profiling instrument (Seabird CTD). Data are collected along a transect between the Verrazano and George Washington Bridges. The data are then processed and binned at 0.5 meter intervals. The processed data is then used during a later laboratory period for the contouring exercise. In class, students work in groups of 2 to 4 people and are provided with the data, a set of contouring instructions, a piece of large (3' x 3') graph paper, a ruler, and a set of colored markers. We then let the groups work together to determine the details of the graphs. Important steps along the way are talking to the students about X and Y scales, interpolation, and choices of contour intervals and colors. Frustration and bottlenecks are common at the beginning when students are unsure how to even begin with the raw data. At some point during the exercise, students start to understand the contour concept and each group usually produces a finished contour map in an hour or so. Interestingly, the groups take pride in the coloring portion of the contouring as it indicates successful interpretation of the data. The exercise concludes with each group presenting and discussing their contour plot. In almost every case, the hands-on graphing has improved the "students" visualization skills. Contouring has been incorporated into the River Summer (www.riversumer.org, http://www.riversumer.org/) program and our Environmental Measurements laboratory course. This has resulted in the exercise being utilized with undergraduates, high-school teachers, graduate students, and college faculty. We are in the process of making this curricular module available online to educators.
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Bim-Gis Integrated Geospatial Information Model Using Semantic Web and Rdf Graphs
NASA Astrophysics Data System (ADS)
Hor, A.-H.; Jadidi, A.; Sohn, G.
2016-06-01
In recent years, 3D virtual indoor/outdoor urban modelling becomes a key spatial information framework for many civil and engineering applications such as evacuation planning, emergency and facility management. For accomplishing such sophisticate decision tasks, there is a large demands for building multi-scale and multi-sourced 3D urban models. Currently, Building Information Model (BIM) and Geographical Information Systems (GIS) are broadly used as the modelling sources. However, data sharing and exchanging information between two modelling domains is still a huge challenge; while the syntactic or semantic approaches do not fully provide exchanging of rich semantic and geometric information of BIM into GIS or vice-versa. This paper proposes a novel approach for integrating BIM and GIS using semantic web technologies and Resources Description Framework (RDF) graphs. The novelty of the proposed solution comes from the benefits of integrating BIM and GIS technologies into one unified model, so-called Integrated Geospatial Information Model (IGIM). The proposed approach consists of three main modules: BIM-RDF and GIS-RDF graphs construction, integrating of two RDF graphs, and query of information through IGIM-RDF graph using SPARQL. The IGIM generates queries from both the BIM and GIS RDF graphs resulting a semantically integrated model with entities representing both BIM classes and GIS feature objects with respect to the target-client application. The linkage between BIM-RDF and GIS-RDF is achieved through SPARQL endpoints and defined by a query using set of datasets and entity classes with complementary properties, relationships and geometries. To validate the proposed approach and its performance, a case study was also tested using IGIM system design.
An Expert System toward Buiding An Earth Science Knowledge Graph
NASA Astrophysics Data System (ADS)
Zhang, J.; Duan, X.; Ramachandran, R.; Lee, T. J.; Bao, Q.; Gatlin, P. N.; Maskey, M.
2017-12-01
In this ongoing work, we aim to build foundations of Cognitive Computing for Earth Science research. The goal of our project is to develop an end-to-end automated methodology for incrementally constructing Knowledge Graphs for Earth Science (KG4ES). These knowledge graphs can then serve as the foundational components for building cognitive systems in Earth science, enabling researchers to uncover new patterns and hypotheses that are virtually impossible to identify today. In addition, this research focuses on developing mining algorithms needed to exploit these constructed knowledge graphs. As such, these graphs will free knowledge from publications that are generated in a very linear, deterministic manner, and structure knowledge in a way that users can both interact and connect with relevant pieces of information. Our major contributions are two-fold. First, we have developed an end-to-end methodology for constructing Knowledge Graphs for Earth Science (KG4ES) using existing corpus of journal papers and reports. One of the key challenges in any machine learning, especially deep learning applications, is the need for robust and large training datasets. We have developed techniques capable of automatically retraining models and incrementally building and updating KG4ES, based on ever evolving training data. We also adopt the evaluation instrument based on common research methodologies used in Earth science research, especially in Atmospheric Science. Second, we have developed an algorithm to infer new knowledge that can exploit the constructed KG4ES. In more detail, we have developed a network prediction algorithm aiming to explore and predict possible new connections in the KG4ES and aid in new knowledge discovery.
LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly.
Bonizzoni, Paola; Vedova, Gianluca Della; Pirola, Yuri; Previtali, Marco; Rizzi, Raffaella
2016-03-01
The large amount of short read data that has to be assembled in future applications, such as in metagenomics or cancer genomics, strongly motivates the investigation of disk-based approaches to index next-generation sequencing (NGS) data. Positive results in this direction stimulate the investigation of efficient external memory algorithms for de novo assembly from NGS data. Our article is also motivated by the open problem of designing a space-efficient algorithm to compute a string graph using an indexing procedure based on the Burrows-Wheeler transform (BWT). We have developed a disk-based algorithm for computing string graphs in external memory: the light string graph (LSG). LSG relies on a new representation of the FM-index that is exploited to use an amount of main memory requirement that is independent from the size of the data set. Moreover, we have developed a pipeline for genome assembly from NGS data that integrates LSG with the assembly step of SGA (Simpson and Durbin, 2012 ), a state-of-the-art string graph-based assembler, and uses BEETL for indexing the input data. LSG is open source software and is available online. We have analyzed our implementation on a 875-million read whole-genome dataset, on which LSG has built the string graph using only 1GB of main memory (reducing the memory occupation by a factor of 50 with respect to SGA), while requiring slightly more than twice the time than SGA. The analysis of the entire pipeline shows an important decrease in memory usage, while managing to have only a moderate increase in the running time.
Decomposition Algorithm for Global Reachability on a Time-Varying Graph
NASA Technical Reports Server (NTRS)
Kuwata, Yoshiaki
2010-01-01
A decomposition algorithm has been developed for global reachability analysis on a space-time grid. By exploiting the upper block-triangular structure, the planning problem is decomposed into smaller subproblems, which is much more scalable than the original approach. Recent studies have proposed the use of a hot-air (Montgolfier) balloon for possible exploration of Titan and Venus because these bodies have thick haze or cloud layers that limit the science return from an orbiter, and the atmospheres would provide enough buoyancy for balloons. One of the important questions that needs to be addressed is what surface locations the balloon can reach from an initial location, and how long it would take. This is referred to as the global reachability problem, where the paths from starting locations to all possible target locations must be computed. The balloon could be driven with its own actuation, but its actuation capability is fairly limited. It would be more efficient to take advantage of the wind field and ride the wind that is much stronger than what the actuator could produce. It is possible to pose the path planning problem as a graph search problem on a directed graph by discretizing the spacetime world and the vehicle actuation. The decomposition algorithm provides reachability analysis of a time-varying graph. Because the balloon only moves in the positive direction in time, the adjacency matrix of the graph can be represented with an upper block-triangular matrix, and this upper block-triangular structure can be exploited to decompose a large graph search problem. The new approach consumes a much smaller amount of memory, which also helps speed up the overall computation when the computing resource has a limited physical memory compared to the problem size.
Multigraph: Interactive Data Graphs on the Web
NASA Astrophysics Data System (ADS)
Phillips, M. B.
2010-12-01
Many aspects of geophysical science involve time dependent data that is often presented in the form of a graph. Considering that the web has become a primary means of communication, there are surprisingly few good tools and techniques available for presenting time-series data on the web. The most common solution is to use a desktop tool such as Excel or Matlab to create a graph which is saved as an image and then included in a web page like any other image. This technique is straightforward, but it limits the user to one particular view of the data, and disconnects the graph from the data in a way that makes updating a graph with new data an often cumbersome manual process. This situation is somewhat analogous to the state of mapping before the advent of GIS. Maps existed only in printed form, and creating a map was a laborious process. In the last several years, however, the world of mapping has experienced a revolution in the form of web-based and other interactive computer technologies, so that it is now commonplace for anyone to easily browse through gigabytes of geographic data. Multigraph seeks to bring a similar ease of access to time series data. Multigraph is a program for displaying interactive time-series data graphs in web pages that includes a simple way of configuring the appearance of the graph and the data to be included. It allows multiple data sources to be combined into a single graph, and allows the user to explore the data interactively. Multigraph lets users explore and visualize "data space" in the same way that interactive mapping applications such as Google Maps facilitate exploring and visualizing geography. Viewing a Multigraph graph is extremely simple and intuitive, and requires no instructions. Creating a new graph for inclusion in a web page involves writing a simple XML configuration file and requires no programming. Multigraph can read data in a variety of formats, and can display data from a web service, allowing users to "surf" through large data sets, downloading only those the parts of the data that are needed for display. Multigraph is currently in use on several web sites including the US Drought Portal (www.drought.gov), the NOAA Climate Services Portal (www.climate.gov), the Climate Reference Network (www.ncdc.noaa.gov/crn), NCDC's State of the Climate Report (www.ncdc.noaa.gov/sotc), and the US Forest Service's Forest Change Assessment Viewer (ews.forestthreats.org/NPDE/NPDE.html). More information about Multigraph is available from the web site www.multigraph.org. Interactive Graph of Global Temperature Anomalies from ClimateWatch Magazine (http://www.climatewatch.noaa.gov/2009/articles/climate-change-global-temperature)
Multi-phase simultaneous segmentation of tumor in lung 4D-CT data with context information.
Shen, Zhengwen; Wang, Huafeng; Xi, Weiwen; Deng, Xiaogang; Chen, Jin; Zhang, Yu
2017-01-01
Lung 4D computed tomography (4D-CT) plays an important role in high-precision radiotherapy because it characterizes respiratory motion, which is crucial for accurate target definition. However, the manual segmentation of a lung tumor is a heavy workload for doctors because of the large number of lung 4D-CT data slices. Meanwhile, tumor segmentation is still a notoriously challenging problem in computer-aided diagnosis. In this paper, we propose a new method based on an improved graph cut algorithm with context information constraint to find a convenient and robust approach of lung 4D-CT tumor segmentation. We combine all phases of the lung 4D-CT into a global graph, and construct a global energy function accordingly. The sub-graph is first constructed for each phase. A context cost term is enforced to achieve segmentation results in every phase by adding a context constraint between neighboring phases. A global energy function is finally constructed by combining all cost terms. The optimization is achieved by solving a max-flow/min-cut problem, which leads to simultaneous and robust segmentation of the tumor in all the lung 4D-CT phases. The effectiveness of our approach is validated through experiments on 10 different lung 4D-CT cases. The comparison with the graph cut without context constraint, the level set method and the graph cut with star shape prior demonstrates that the proposed method obtains more accurate and robust segmentation results.
VIGOR: Interactive Visual Exploration of Graph Query Results.
Pienta, Robert; Hohman, Fred; Endert, Alex; Tamersoy, Acar; Roundy, Kevin; Gates, Chris; Navathe, Shamkant; Chau, Duen Horng
2018-01-01
Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.
Red Shifts with Obliquely Approaching Light Sources.
ERIC Educational Resources Information Center
Head, C. E.; Moore-Head, M. E.
1988-01-01
Refutes the Doppler effect as the explanation of large red shifts in the spectra of distant galaxies and explains the relativistic effects in which the light sources approach the observer obliquely. Provides several diagrams and graphs. (YP)
A novel adaptive Cuckoo search for optimal query plan generation.
Gomathi, Ramalingam; Sharmila, Dhandapani
2014-01-01
The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Dexter, Alex; Race, Alan M; Steven, Rory T; Barnes, Jennifer R; Hulme, Heather; Goodwin, Richard J A; Styles, Iain B; Bunch, Josephine
2017-11-07
Clustering is widely used in MSI to segment anatomical features and differentiate tissue types, but existing approaches are both CPU and memory-intensive, limiting their application to small, single data sets. We propose a new approach that uses a graph-based algorithm with a two-phase sampling method that overcomes this limitation. We demonstrate the algorithm on a range of sample types and show that it can segment anatomical features that are not identified using commonly employed algorithms in MSI, and we validate our results on synthetic MSI data. We show that the algorithm is robust to fluctuations in data quality by successfully clustering data with a designed-in variance using data acquired with varying laser fluence. Finally, we show that this method is capable of generating accurate segmentations of large MSI data sets acquired on the newest generation of MSI instruments and evaluate these results by comparison with histopathology.
Unified Photo Enhancement by Discovering Aesthetic Communities From Flickr.
Hong, Richang; Zhang, Luming; Tao, Dacheng
2016-03-01
Photo enhancement refers to the process of increasing the aesthetic appeal of a photo, such as changing the photo aspect ratio and spatial recomposition. It is a widely used technique in the printing industry, graphic design, and cinematography. In this paper, we propose a unified and socially aware photo enhancement framework which can leverage the experience of photographers with various aesthetic topics (e.g., portrait and landscape). We focus on photos from the image hosting site Flickr, which has 87 million users and to which more than 3.5 million photos are uploaded daily. First, a tagwise regularized topic model is proposed to describe the aesthetic topic of each Flickr user, and coherent and interpretable topics are discovered by leveraging both the visual features and tags of photos. Next, a graph is constructed to describe the similarities in aesthetic topics between the users. Noticeably, densely connected users have similar aesthetic topics, which are categorized into different communities by a dense subgraph mining algorithm. Finally, a probabilistic model is exploited to enhance the aesthetic attractiveness of a test photo by leveraging the photographic experiences of Flickr users from the corresponding communities of that photo. Paired-comparison-based user studies show that our method performs competitively on photo retargeting and recomposition. Moreover, our approach accurately detects aesthetic communities in a photo set crawled from nearly 100000 Flickr users.
Turbulent dusty boundary layer in an ANFO surface-burst explosion
NASA Astrophysics Data System (ADS)
Kuhl, A. L.; Ferguson, R. E.; Chien, K. Y.; Collins, J. P.
1992-01-01
This paper describes the results of numerical simulations of the dusty, turbulent boundary layer created by a surface burst explosion. The blast wave was generated by the detonation of a 600-T hemisphere of ANFO, similar to those used in large-scale field tests. The surface was assumed to be ideally noncratering but contained an initial loose layer of dust. The dust-air mixture in this fluidized bed was modeled as a dense gas (i.e., an equilibrium model, valid for very small-diameter dust particles). The evolution of the flow was calculated by a high-order Godunov code that solves the nonsteady conservation laws. Shock interactions with dense layer generated vorticity near the wall, a result that is similar to viscous, no-slip effects found in clean flows. The resulting wall shear layer was unstable, and rolled up into large-scale rotational structures. These structures entrained dense material from the wall layer and created a chaotically striated flow. The boundary layer grew due to merging of the large-scale structures and due to local entrainment of the dense material from the fluidized bed. The chaotic flow was averaged along similarity lines (i.e., lines of constant values of x = r/Rs and y = z/Rs where R(sub s) = ct(exp alpha)) to establish the mean-flow profiles and the r.m.s. fluctuating-flow profiles of the boundary layer.
Super-resolving random-Gaussian apodized photon sieve.
Sabatyan, Arash; Roshaninejad, Parisa
2012-09-10
A novel apodized photon sieve is presented in which random dense Gaussian distribution is implemented to modulate the pinhole density in each zone. The random distribution in dense Gaussian distribution causes intrazone discontinuities. Also, the dense Gaussian distribution generates a substantial number of pinholes in order to form a large degree of overlap between the holes in a few innermost zones of the photon sieve; thereby, clear zones are formed. The role of the discontinuities on the focusing properties of the photon sieve is examined as well. Analysis shows that secondary maxima have evidently been suppressed, transmission has increased enormously, and the central maxima width is approximately unchanged in comparison to the dense Gaussian distribution. Theoretical results have been completely verified by experiment.
Propagation of light through small clouds of cold interacting atoms
NASA Astrophysics Data System (ADS)
Jennewein, S.; Sortais, Y. R. P.; Greffet, J.-J.; Browaeys, A.
2016-11-01
We demonstrate experimentally that a dense cloud of cold atoms with a size comparable to the wavelength of light can induce large group delays on a laser pulse when the laser is tightly focused on it and is close to an atomic resonance. Delays as large as -10 ns are observed, corresponding to "superluminal" propagation with negative group velocities as low as -300 m /s . Strikingly, this large delay is associated with a moderate extinction owing to the very small size of the dense cloud. It implies that a large phase shift is imprinted on the continuous laser beam. Our system may thus be useful for applications to quantum technologies, such as variable delay line for individual photons or phase imprint between two beams at the single-photon level.
Burrows, Nilka R.; Geiss, Linda S.
2014-01-01
The Diabetes Interactive Atlas is a recently released Web-based collection of maps that allows users to view geographic patterns and examine trends in diabetes and its risk factors over time across the United States and within states. The atlas provides maps, tables, graphs, and motion charts that depict national, state, and county data. Large amounts of data can be viewed in various ways simultaneously. In this article, we describe the design and technical issues for developing the atlas and provide an overview of the atlas’ maps and graphs. The Diabetes Interactive Atlas improves visualization of geographic patterns, highlights observation of trends, and demonstrates the concomitant geographic and temporal growth of diabetes and obesity. PMID:24503340
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-06-01
meraculous2 is a whole genome shotgun assembler for short-reads that is capable of assembling large, polymorphic genomes with modest computational requirements. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. Additional features include (1) handling of allelic variation using "bubble" structures within the deBruijn graph, (2) gap closing of repetitive and low quality regions using localized assemblies, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising onmore » scaffolding accuracy« less
Feynman graphs and the large dimensional limit of multipartite entanglement
NASA Astrophysics Data System (ADS)
Di Martino, Sara; Facchi, Paolo; Florio, Giuseppe
2018-01-01
In this paper, we extend the analysis of multipartite entanglement, based on techniques from classical statistical mechanics, to a system composed of n d-level parties (qudits). We introduce a suitable partition function at a fictitious temperature with the average local purity of the system as Hamiltonian. In particular, we analyze the high-temperature expansion of this partition function, prove the convergence of the series, and study its asymptotic behavior as d → ∞. We make use of a diagrammatic technique, classify the graphs, and study their degeneracy. We are thus able to evaluate their contributions and estimate the moments of the distribution of the local purity.
Markovian Search Games in Heterogeneous Spaces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Griffin, Christopher H
2009-01-01
We consider how to search for a mobile evader in a large heterogeneous region when sensors are used for detection. Sensors are modeled using probability of detection. Due to environmental effects, this probability will not be constant over the entire region. We map this problem to a graph search problem and, even though deterministic graph search is NP-complete, we derive a tractable, optimal, probabilistic search strategy. We do this by defining the problem as a differential game played on a Markov chain. We prove that this strategy is optimal in the sense of Nash. Simulations of an example problem illustratemore » our approach and verify our claims.« less
Teaching the Assessment of Normality Using Large Easily-Generated Real Data Sets
ERIC Educational Resources Information Center
Kulp, Christopher W.; Sprechini, Gene D.
2016-01-01
A classroom activity is presented, which can be used in teaching students statistics with an easily generated, large, real world data set. The activity consists of analyzing a video recording of an object. The colour data of the recorded object can then be used as a data set to explore variation in the data using graphs including histograms,…
Structural network efficiency is associated with cognitive impairment in small-vessel disease.
Lawrence, Andrew J; Chung, Ai Wern; Morris, Robin G; Markus, Hugh S; Barrick, Thomas R
2014-07-22
To characterize brain network connectivity impairment in cerebral small-vessel disease (SVD) and its relationship with MRI disease markers and cognitive impairment. A cross-sectional design applied graph-based efficiency analysis to deterministic diffusion tensor tractography data from 115 patients with lacunar infarction and leukoaraiosis and 50 healthy individuals. Structural connectivity was estimated between 90 cortical and subcortical brain regions and efficiency measures of resulting graphs were analyzed. Networks were compared between SVD and control groups, and associations between efficiency measures, conventional MRI disease markers, and cognitive function were tested. Brain diffusion tensor tractography network connectivity was significantly reduced in SVD: networks were less dense, connection weights were lower, and measures of network efficiency were significantly disrupted. The degree of brain network disruption was associated with MRI measures of disease severity and cognitive function. In multiple regression models controlling for confounding variables, associations with cognition were stronger for network measures than other MRI measures including conventional diffusion tensor imaging measures. A total mediation effect was observed for the association between fractional anisotropy and mean diffusivity measures and executive function and processing speed. Brain network connectivity in SVD is disturbed, this disturbance is related to disease severity, and within a mediation framework fully or partly explains previously observed associations between MRI measures and SVD-related cognitive dysfunction. These cross-sectional results highlight the importance of network disruption in SVD and provide support for network measures as a disease marker in treatment studies. © 2014 American Academy of Neurology.
Structural network efficiency is associated with cognitive impairment in small-vessel disease
Chung, Ai Wern; Morris, Robin G.; Markus, Hugh S.; Barrick, Thomas R.
2014-01-01
Objective: To characterize brain network connectivity impairment in cerebral small-vessel disease (SVD) and its relationship with MRI disease markers and cognitive impairment. Methods: A cross-sectional design applied graph-based efficiency analysis to deterministic diffusion tensor tractography data from 115 patients with lacunar infarction and leukoaraiosis and 50 healthy individuals. Structural connectivity was estimated between 90 cortical and subcortical brain regions and efficiency measures of resulting graphs were analyzed. Networks were compared between SVD and control groups, and associations between efficiency measures, conventional MRI disease markers, and cognitive function were tested. Results: Brain diffusion tensor tractography network connectivity was significantly reduced in SVD: networks were less dense, connection weights were lower, and measures of network efficiency were significantly disrupted. The degree of brain network disruption was associated with MRI measures of disease severity and cognitive function. In multiple regression models controlling for confounding variables, associations with cognition were stronger for network measures than other MRI measures including conventional diffusion tensor imaging measures. A total mediation effect was observed for the association between fractional anisotropy and mean diffusivity measures and executive function and processing speed. Conclusions: Brain network connectivity in SVD is disturbed, this disturbance is related to disease severity, and within a mediation framework fully or partly explains previously observed associations between MRI measures and SVD-related cognitive dysfunction. These cross-sectional results highlight the importance of network disruption in SVD and provide support for network measures as a disease marker in treatment studies. PMID:24951477
The Continued Reduction in Dense Fog in the Southern California Region: Possible Causes
NASA Astrophysics Data System (ADS)
LaDochy, S.; Witiw, M.
2012-05-01
Dense fog appears to be decreasing in many parts of the world, especially in western cities. Dense fog (visibility <400 m) is disappearing in the urban southern California area also. There the decrease in dense fog events can be explained mainly by declining particulate levels, Pacific sea surface temperatures (SST), and increased urban warming. Using hourly data from 1948 to the present, we looked at the relationship between fog events in the region and contributing factors and trends over time. Initially a strong relationship was suggested between the occurrence of dense fog and the phases of an atmosphere-ocean cycle: the Pacific Decadal Oscillation (PDO). However, closer analysis revealed the importance to fog variability of an increasing urban heat island and the amount of atmospheric suspended particulate matter. Results show a substantial decrease in the occurrence of very low visibilities (<400 m) at the two airport stations in close proximity to the Pacific Ocean, LAX (Los Angeles International) and LGB (Long Beach International). A downward trend in particulate concentrations, coupled with an upward trend in urban temperatures were associated with the decrease in dense fog occurrence at both LAX and LGB. LAX dense fog that reached over 300 h in 1950 dropped steadily, with 0 h recorded in 1997. Since 1997, there has been a slight recovery with both 2008 and 2009 recording over 30 h of dense fog at both locations. In this study we examine whether the upturn is a temporary reversal of the trend. To remove the urban effect, we also included fog data from Vandenberg Air Force Base (VBG), located in a relatively sparsely populated area approximately 200 km to the north of metropolitan Los Angeles. Particulates, urban heat island, and Pacific SSTs all seem to be contributing factors to the decrease in fog in southern California, along with large-scale atmosphere-ocean interaction cycles. Case studies of local and regional dense fog in southern California point to the importance of strong, low inversions and to a lesser contributor, Santa Ana winds. Both are associated with large-scale atmospheric circulation patterns, which have changed markedly over the period of studied. These changes point to continued decreases in dense fog in the region.
Reactions of N(+) (3P) ions with normal, para, and deuterated hydrogens at low temperatures
NASA Astrophysics Data System (ADS)
Marquette, J. B.; Rebrion, C.; Rowe, B. R.
1988-08-01
The method of reaction kinetics in uniform supersonic flow (French designation CRESU; Dupeyrat et al., 1985) is used to study the reactions of N(+) (3P) with n-H2, p-H2, HD, and D2 at temperatures 8, 20, 27, 45, 68, 163, and 300 K. Results obtained using either He or N2 as the buffer gas are presented in tables and graphs and discussed in detail. The reaction with p-H2 is shown to be significantly slower than that with n-H2, a finding attributed to rotational-energy and spin-orbit-energy driving of these processes. It is pointed out that these results are in agreement with those of Barlow et al. (1986) and Herbst et al. (1987), apparently ruling out these reactions as the source of the NH3 observed in dense interstellar cloud cores.
NASA Astrophysics Data System (ADS)
Tadić, Bosiljka; Thurner, Stefan; Rodgers, G. J.
2004-03-01
We study the microscopic time fluctuations of traffic load and the global statistical properties of a dense traffic of particles on scale-free cyclic graphs. For a wide range of driving rates R the traffic is stationary and the load time series exhibits antipersistence due to the regulatory role of the superstructure associated with two hub nodes in the network. We discuss how the superstructure affects the functioning of the network at high traffic density and at the jamming threshold. The degree of correlations systematically decreases with increasing traffic density and eventually disappears when approaching a jamming density Rc. Already before jamming we observe qualitative changes in the global network-load distributions and the particle queuing times. These changes are related to the occurrence of temporary crises in which the network-load increases dramatically, and then slowly falls back to a value characterizing free flow.
Uniform, dense arrays of vertically aligned, large-diameter single-walled carbon nanotubes.
Han, Zhao Jun; Ostrikov, Kostya
2012-04-04
Precisely controlled reactive chemical vapor synthesis of highly uniform, dense arrays of vertically aligned single-walled carbon nanotubes (SWCNTs) using tailored trilayered Fe/Al(2)O(3)/SiO(2) catalyst is demonstrated. More than 90% population of thick nanotubes (>3 nm in diameter) can be produced by tailoring the thickness and microstructure of the secondary catalyst supporting SiO(2) layer, which is commonly overlooked. The proposed model based on the atomic force microanalysis suggests that this tailoring leads to uniform and dense arrays of relatively large Fe catalyst nanoparticles on which the thick SWCNTs nucleate, while small nanotubes and amorphous carbon are effectively etched away. Our results resolve a persistent issue of selective (while avoiding multiwalled nanotubes and other carbon nanostructures) synthesis of thick vertically aligned SWCNTs whose easily switchable thickness-dependent electronic properties enable advanced applications in nanoelectronic, energy, drug delivery, and membrane technologies.
Hybrid Propulsion Technology Program, phase 1. Volume 2: Technical discussion
NASA Technical Reports Server (NTRS)
1989-01-01
Information on hybrid propulsion system concepts is given largely in the form of outlines, charts and graphs. Included are the concept definition, trade study data generation, concept evaluation and selection, conceptual design definition, and technology definition.
Couple Graph Based Label Propagation Method for Hyperspectral Remote Sensing Data Classification
NASA Astrophysics Data System (ADS)
Wang, X. P.; Hu, Y.; Chen, J.
2018-04-01
Graph based semi-supervised classification method are widely used for hyperspectral image classification. We present a couple graph based label propagation method, which contains both the adjacency graph and the similar graph. We propose to construct the similar graph by using the similar probability, which utilize the label similarity among examples probably. The adjacency graph was utilized by a common manifold learning method, which has effective improve the classification accuracy of hyperspectral data. The experiments indicate that the couple graph Laplacian which unite both the adjacency graph and the similar graph, produce superior classification results than other manifold Learning based graph Laplacian and Sparse representation based graph Laplacian in label propagation framework.
Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Pin-Yu; Choudhury, Sutanay; Hero, Alfred
Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles ofmore » graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.« less
Graphs, matrices, and the GraphBLAS: Seven good reasons
Kepner, Jeremy; Bader, David; Buluç, Aydın; ...
2015-01-01
The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data complexity, (3) security, (4) mathematical complexity, (5) theoretical analysis, (6) serial performance, and (7) parallel performance. Implementing graph algorithms using matrix-based approaches provides a number of promising solutions to these challenges. The GraphBLAS standard (istcbigdata.org/GraphBlas) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. The GraphBLAS mathematically defines a core set of matrix-based graph operations that can be used to implementmore » a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the GraphBLAS and describes how the GraphBLAS can be used to address many of the challenges associated with analysis of graphs.« less
Zhang, Qin; Yao, Quanying
2018-05-01
The dynamic uncertain causality graph (DUCG) is a newly presented framework for uncertain causality representation and probabilistic reasoning. It has been successfully applied to online fault diagnoses of large, complex industrial systems, and decease diagnoses. This paper extends the DUCG to model more complex cases than what could be previously modeled, e.g., the case in which statistical data are in different groups with or without overlap, and some domain knowledge and actions (new variables with uncertain causalities) are introduced. In other words, this paper proposes to use -mode, -mode, and -mode of the DUCG to model such complex cases and then transform them into either the standard -mode or the standard -mode. In the former situation, if no directed cyclic graph is involved, the transformed result is simply a Bayesian network (BN), and existing inference methods for BNs can be applied. In the latter situation, an inference method based on the DUCG is proposed. Examples are provided to illustrate the methodology.
Visual texture perception via graph-based semi-supervised learning
NASA Astrophysics Data System (ADS)
Zhang, Qin; Dong, Junyu; Zhong, Guoqiang
2018-04-01
Perceptual features, for example direction, contrast and repetitiveness, are important visual factors for human to perceive a texture. However, it needs to perform psychophysical experiment to quantify these perceptual features' scale, which requires a large amount of human labor and time. This paper focuses on the task of obtaining perceptual features' scale of textures by small number of textures with perceptual scales through a rating psychophysical experiment (what we call labeled textures) and a mass of unlabeled textures. This is the scenario that the semi-supervised learning is naturally suitable for. This is meaningful for texture perception research, and really helpful for the perceptual texture database expansion. A graph-based semi-supervised learning method called random multi-graphs, RMG for short, is proposed to deal with this task. We evaluate different kinds of features including LBP, Gabor, and a kind of unsupervised deep features extracted by a PCA-based deep network. The experimental results show that our method can achieve satisfactory effects no matter what kind of texture features are used.
A SPECTRAL GRAPH APPROACH TO DISCOVERING GENETIC ANCESTRY1
Lee, Ann B.; Luca, Diana; Roeder, Kathryn
2010-01-01
Mapping human genetic variation is fundamentally interesting in fields such as anthropology and forensic inference. At the same time, patterns of genetic diversity confound efforts to determine the genetic basis of complex disease. Due to technological advances, it is now possible to measure hundreds of thousands of genetic variants per individual across the genome. Principal component analysis (PCA) is routinely used to summarize the genetic similarity between subjects. The eigenvectors are interpreted as dimensions of ancestry. We build on this idea using a spectral graph approach. In the process we draw on connections between multidimensional scaling and spectral kernel methods. Our approach, based on a spectral embedding derived from the normalized Laplacian of a graph, can produce more meaningful delineation of ancestry than by using PCA. The method is stable to outliers and can more easily incorporate different similarity measures of genetic data than PCA. We illustrate a new algorithm for genetic clustering and association analysis on a large, genetically heterogeneous sample. PMID:20689656
PyBoolNet: a python package for the generation, analysis and visualization of boolean networks.
Klarner, Hannes; Streck, Adam; Siebert, Heike
2017-03-01
The goal of this project is to provide a simple interface to working with Boolean networks. Emphasis is put on easy access to a large number of common tasks including the generation and manipulation of networks, attractor and basin computation, model checking and trap space computation, execution of established graph algorithms as well as graph drawing and layouts. P y B ool N et is a Python package for working with Boolean networks that supports simple access to model checking via N u SMV, standard graph algorithms via N etwork X and visualization via dot . In addition, state of the art attractor computation exploiting P otassco ASP is implemented. The package is function-based and uses only native Python and N etwork X data types. https://github.com/hklarner/PyBoolNet. hannes.klarner@fu-berlin.de. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo
2012-08-01
We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.
NASA Astrophysics Data System (ADS)
Ren, Jie
2017-12-01
The process by which a kinesin motor couples its ATPase activity with concerted mechanical hand-over-hand steps is a foremost topic of molecular motor physics. Two major routes toward elucidating kinesin mechanisms are the motility performance characterization of velocity and run length, and single-molecular state detection experiments. However, these two sets of experimental approaches are largely uncoupled to date. Here, we introduce an integrative motility state analysis based on a theorized kinetic graph theory for kinesin, which, on one hand, is validated by a wealth of accumulated motility data, and, on the other hand, allows for rigorous quantification of state occurrences and chemomechanical cycling probabilities. An interesting linear scaling for kinesin motility performance across species is discussed as well. An integrative kinetic graph theory analysis provides a powerful tool to bridge motility and state characterization experiments, so as to forge a unified effort for the elucidation of the working mechanisms of molecular motors.
Concordant Chemical Reaction Networks and the Species-Reaction Graph
Shinar, Guy; Feinberg, Martin
2015-01-01
In a recent paper it was shown that, for chemical reaction networks possessing a subtle structural property called concordance, dynamical behavior of a very circumscribed (and largely stable) kind is enforced, so long as the kinetics lies within the very broad and natural weakly monotonic class. In particular, multiple equilibria are precluded, as are degenerate positive equilibria. Moreover, under certain circumstances, also related to concordance, all real eigenvalues associated with a positive equilibrium are negative. Although concordance of a reaction network can be decided by readily available computational means, we show here that, when a nondegenerate network’s Species-Reaction Graph satisfies certain mild conditions, concordance and its dynamical consequences are ensured. These conditions are weaker than earlier ones invoked to establish kinetic system injectivity, which, in turn, is just one ramification of network concordance. Because the Species-Reaction Graph resembles pathway depictions often drawn by biochemists, results here expand the possibility of inferring significant dynamical information directly from standard biochemical reaction diagrams. PMID:22940368
Adjusting protein graphs based on graph entropy.
Peng, Sheng-Lung; Tsay, Yu-Wei
2014-01-01
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Adjusting protein graphs based on graph entropy
2014-01-01
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid. PMID:25474347
What Can Graph Theory Tell Us About Word Learning and Lexical Retrieval?
Vitevitch, Michael S.
2008-01-01
Purpose Graph theory and the new science of networks provide a mathematically rigorous approach to examine the development and organization of complex systems. These tools were applied to the mental lexicon to examine the organization of words in the lexicon and to explore how that structure might influence the acquisition and retrieval of phonological word-forms. Method Pajek, a program for large network analysis and visualization (V. Batagelj & A. Mvrar, 1998), was used to examine several characteristics of a network derived from a computerized database of the adult lexicon. Nodes in the network represented words, and a link connected two nodes if the words were phonological neighbors. Results The average path length and clustering coefficient suggest that the phonological network exhibits small-world characteristics. The degree distribution was fit better by an exponential rather than a power-law function. Finally, the network exhibited assortative mixing by degree. Some of these structural characteristics were also found in graphs that were formed by 2 simple stochastic processes suggesting that similar processes might influence the development of the lexicon. Conclusions The graph theoretic perspective may provide novel insights about the mental lexicon and lead to future studies that help us better understand language development and processing. PMID:18367686
Scale free effects in world currency exchange network
NASA Astrophysics Data System (ADS)
Górski, A. Z.; Drożdż, S.; Kwapień, J.
2008-11-01
A large collection of daily time series for 60 world currencies' exchange rates is considered. The correlation matrices are calculated and the corresponding Minimal Spanning Tree (MST) graphs are constructed for each of those currencies used as reference for the remaining ones. It is shown that multiplicity of the MST graphs' nodes to a good approximation develops a power like, scale free distribution with the scaling exponent similar as for several other complex systems studied so far. Furthermore, quantitative arguments in favor of the hierarchical organization of the world currency exchange network are provided by relating the structure of the above MST graphs and their scaling exponents to those that are derived from an exactly solvable hierarchical network model. A special status of the USD during the period considered can be attributed to some departures of the MST features, when this currency (or some other tied to it) is used as reference, from characteristics typical to such a hierarchical clustering of nodes towards those that correspond to the random graphs. Even though in general the basic structure of the MST is robust with respect to changing the reference currency some trace of a systematic transition from somewhat dispersed - like the USD case - towards more compact MST topology can be observed when correlations increase.
Optimal Multiple Surface Segmentation With Shape and Context Priors
Bai, Junjie; Garvin, Mona K.; Sonka, Milan; Buatti, John M.; Wu, Xiaodong
2014-01-01
Segmentation of multiple surfaces in medical images is a challenging problem, further complicated by the frequent presence of weak boundary evidence, large object deformations, and mutual influence between adjacent objects. This paper reports a novel approach to multi-object segmentation that incorporates both shape and context prior knowledge in a 3-D graph-theoretic framework to help overcome the stated challenges. We employ an arc-based graph representation to incorporate a wide spectrum of prior information through pair-wise energy terms. In particular, a shape-prior term is used to penalize local shape changes and a context-prior term is used to penalize local surface-distance changes from a model of the expected shape and surface distances, respectively. The globally optimal solution for multiple surfaces is obtained by computing a maximum flow in a low-order polynomial time. The proposed method was validated on intraretinal layer segmentation of optical coherence tomography images and demonstrated statistically significant improvement of segmentation accuracy compared to our earlier graph-search method that was not utilizing shape and context priors. The mean unsigned surface positioning errors obtained by the conventional graph-search approach (6.30 ± 1.58 μm) was improved to 5.14 ± 0.99 μm when employing our new method with shape and context priors. PMID:23193309
Introducing the slime mold graph repository
NASA Astrophysics Data System (ADS)
Dirnberger, M.; Mehlhorn, K.; Mehlhorn, T.
2017-07-01
We introduce the slime mold graph repository or SMGR, a novel data collection promoting the visibility, accessibility and reuse of experimental data revolving around network-forming slime molds. By making data readily available to researchers across multiple disciplines, the SMGR promotes novel research as well as the reproduction of original results. While SMGR data may take various forms, we stress the importance of graph representations of slime mold networks due to their ease of handling and their large potential for reuse. Data added to the SMGR stands to gain impact beyond initial publications or even beyond its domain of origin. We initiate the SMGR with the comprehensive Kist Europe data set focusing on the slime mold Physarum polycephalum, which we obtained in the course of our original research. It contains sequences of images documenting growth and network formation of the organism under constant conditions. Suitable image sequences depicting the typical P. polycephalum network structures are used to compute sequences of graphs faithfully capturing them. Given such sequences, node identities are computed, tracking the development of nodes over time. Based on this information we demonstrate two out of many possible ways to begin exploring the data. The entire data set is well-documented, self-contained and ready for inspection at http://smgr.mpi-inf.mpg.de.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Disconnection of network hubs and cognitive impairment after traumatic brain injury.
Fagerholm, Erik D; Hellyer, Peter J; Scott, Gregory; Leech, Robert; Sharp, David J
2015-06-01
Traumatic brain injury affects brain connectivity by producing traumatic axonal injury. This disrupts the function of large-scale networks that support cognition. The best way to describe this relationship is unclear, but one elegant approach is to view networks as graphs. Brain regions become nodes in the graph, and white matter tracts the connections. The overall effect of an injury can then be estimated by calculating graph metrics of network structure and function. Here we test which graph metrics best predict the presence of traumatic axonal injury, as well as which are most highly associated with cognitive impairment. A comprehensive range of graph metrics was calculated from structural connectivity measures for 52 patients with traumatic brain injury, 21 of whom had microbleed evidence of traumatic axonal injury, and 25 age-matched controls. White matter connections between 165 grey matter brain regions were defined using tractography, and structural connectivity matrices calculated from skeletonized diffusion tensor imaging data. This technique estimates injury at the centre of tract, but is insensitive to damage at tract edges. Graph metrics were calculated from the resulting connectivity matrices and machine-learning techniques used to select the metrics that best predicted the presence of traumatic brain injury. In addition, we used regularization and variable selection via the elastic net to predict patient behaviour on tests of information processing speed, executive function and associative memory. Support vector machines trained with graph metrics of white matter connectivity matrices from the microbleed group were able to identify patients with a history of traumatic brain injury with 93.4% accuracy, a result robust to different ways of sampling the data. Graph metrics were significantly associated with cognitive performance: information processing speed (R(2) = 0.64), executive function (R(2) = 0.56) and associative memory (R(2) = 0.25). These results were then replicated in a separate group of patients without microbleeds. The most influential graph metrics were betweenness centrality and eigenvector centrality, which provide measures of the extent to which a given brain region connects other regions in the network. Reductions in betweenness centrality and eigenvector centrality were particularly evident within hub regions including the cingulate cortex and caudate. Our results demonstrate that betweenness centrality and eigenvector centrality are reduced within network hubs, due to the impact of traumatic axonal injury on network connections. The dominance of betweenness centrality and eigenvector centrality suggests that cognitive impairment after traumatic brain injury results from the disconnection of network hubs by traumatic axonal injury. © The Author (2015). Published by Oxford University Press on behalf of the Guarantors of Brain.
Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion.
Babaei, Sepideh; Hulsman, Marc; Reinders, Marcel; de Ridder, Jeroen
2013-01-23
Delineating the molecular drivers of cancer, i.e. determining cancer genes and the pathways which they deregulate, is an important challenge in cancer research. In this study, we aim to identify pathways of frequently mutated genes by exploiting their network neighborhood encoded in the protein-protein interaction network. To this end, we introduce a multi-scale diffusion kernel and apply it to a large collection of murine retroviral insertional mutagenesis data. The diffusion strength plays the role of scale parameter, determining the size of the network neighborhood that is taken into account. As a result, in addition to detecting genes with frequent mutations in their genomic vicinity, we find genes that harbor frequent mutations in their interaction network context. We identify densely connected components of known and putatively novel cancer genes and demonstrate that they are strongly enriched for cancer related pathways across the diffusion scales. Moreover, the mutations in the clusters exhibit a significant pattern of mutual exclusion, supporting the conjecture that such genes are functionally linked. Using multi-scale diffusion kernel, various infrequently mutated genes are found to harbor significant numbers of mutations in their interaction network neighborhood. Many of them are well-known cancer genes. The results demonstrate the importance of defining recurrent mutations while taking into account the interaction network context. Importantly, the putative cancer genes and networks detected in this study are found to be significant at different diffusion scales, confirming the necessity of a multi-scale analysis.