Dynamic graph cuts for efficient inference in Markov Random Fields.
Kohli, Pushmeet; Torr, Philip H S
2007-12-01
Abstract-In this paper we present a fast new fully dynamic algorithm for the st-mincut/max-flow problem. We show how this algorithm can be used to efficiently compute MAP solutions for certain dynamically changing MRF models in computer vision such as image segmentation. Specifically, given the solution of the max-flow problem on a graph, the dynamic algorithm efficiently computes the maximum flow in a modified version of the graph. The time taken by it is roughly proportional to the total amount of change in the edge weights of the graph. Our experiments show that, when the number of changes in the graph is small, the dynamic algorithm is significantly faster than the best known static graph cut algorithm. We test the performance of our algorithm on one particular problem: the object-background segmentation problem for video. It should be noted that the application of our algorithm is not limited to the above problem, the algorithm is generic and can be used to yield similar improvements in many other cases that involve dynamic change.
Efficient enumeration of monocyclic chemical graphs with given path frequencies
2014-01-01
Background The enumeration of chemical graphs (molecular graphs) satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics because it leads to a variety of useful applications including structure determination and development of novel chemical compounds. Results We consider the problem of enumerating chemical graphs with monocyclic structure (a graph structure that contains exactly one cycle) from a given set of feature vectors, where a feature vector represents the frequency of the prescribed paths in a chemical compound to be constructed and the set is specified by a pair of upper and lower feature vectors. To enumerate all tree-like (acyclic) chemical graphs from a given set of feature vectors, Shimizu et al. and Suzuki et al. proposed efficient branch-and-bound algorithms based on a fast tree enumeration algorithm. In this study, we devise a novel method for extending these algorithms to enumeration of chemical graphs with monocyclic structure by designing a fast algorithm for testing uniqueness. The results of computational experiments reveal that the computational efficiency of the new algorithm is as good as those for enumeration of tree-like chemical compounds. Conclusions We succeed in expanding the class of chemical graphs that are able to be enumerated efficiently. PMID:24955135
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
1987-03-31
processors . The symmetry-breaking algorithms give efficient ways to convert probabilistic algorithms to deterministic algorithms. Some of the...techniques have been applied to construct several efficient linear- processor algorithms for graph problems, including an O(lg* n)-time algorithm for (A + 1...On n-node graphs, the algorithm works in O(log 2 n) time using only n processors , in contrast to the previous best algorithm which used about n3
Incremental k-core decomposition: Algorithms and evaluation
Sariyuce, Ahmet Erdem; Gedik, Bugra; Jacques-SIlva, Gabriela; ...
2016-02-01
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that ismore » guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. Furthermore, for a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.« less
2013-01-01
Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
Efficient solution for finding Hamilton cycles in undirected graphs.
Alhalabi, Wadee; Kitanneh, Omar; Alharbi, Amira; Balfakih, Zain; Sarirete, Akila
2016-01-01
The Hamilton cycle problem is closely related to a series of famous problems and puzzles (traveling salesman problem, Icosian game) and, due to the fact that it is NP-complete, it was extensively studied with different algorithms to solve it. The most efficient algorithm is not known. In this paper, a necessary condition for an arbitrary un-directed graph to have Hamilton cycle is proposed. Based on this condition, a mathematical solution for this problem is developed and several proofs and an algorithmic approach are introduced. The algorithm is successfully implemented on many Hamiltonian and non-Hamiltonian graphs. This provides a new effective approach to solve a problem that is fundamental in graph theory and can influence the manner in which the existing applications are used and improved.
Reproducibility of graph metrics of human brain structural networks.
Duda, Jeffrey T; Cook, Philip A; Gee, James C
2014-01-01
Recent interest in human brain connectivity has led to the application of graph theoretical analysis to human brain structural networks, in particular white matter connectivity inferred from diffusion imaging and fiber tractography. While these methods have been used to study a variety of patient populations, there has been less examination of the reproducibility of these methods. A number of tractography algorithms exist and many of these are known to be sensitive to user-selected parameters. The methods used to derive a connectivity matrix from fiber tractography output may also influence the resulting graph metrics. Here we examine how these algorithm and parameter choices influence the reproducibility of proposed graph metrics on a publicly available test-retest dataset consisting of 21 healthy adults. The dice coefficient is used to examine topological similarity of constant density subgraphs both within and between subjects. Seven graph metrics are examined here: mean clustering coefficient, characteristic path length, largest connected component size, assortativity, global efficiency, local efficiency, and rich club coefficient. The reproducibility of these network summary measures is examined using the intraclass correlation coefficient (ICC). Graph curves are created by treating the graph metrics as functions of a parameter such as graph density. Functional data analysis techniques are used to examine differences in graph measures that result from the choice of fiber tracking algorithm. The graph metrics consistently showed good levels of reproducibility as measured with ICC, with the exception of some instability at low graph density levels. The global and local efficiency measures were the most robust to the choice of fiber tracking algorithm.
Azad, Ariful; Buluç, Aydın
2016-05-16
We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
Efficient and Scalable Graph Similarity Joins in MapReduce
Chen, Yifan; Zhang, Weiming; Tang, Jiuyang
2014-01-01
Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results. PMID:25121135
Efficient and scalable graph similarity joins in MapReduce.
Chen, Yifan; Zhao, Xiang; Xiao, Chuan; Zhang, Weiming; Tang, Jiuyang
2014-01-01
Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.
Mining connected global and local dense subgraphs for bigdata
NASA Astrophysics Data System (ADS)
Wu, Bo; Shen, Haiying
2016-01-01
The problem of discovering connected dense subgraphs of natural graphs is important in data analysis. Discovering dense subgraphs that do not contain denser subgraphs or are not contained in denser subgraphs (called significant dense subgraphs) is also critical for wide-ranging applications. In spite of many works on discovering dense subgraphs, there are no algorithms that can guarantee the connectivity of the returned subgraphs or discover significant dense subgraphs. Hence, in this paper, we define two subgraph discovery problems to discover connected and significant dense subgraphs, propose polynomial-time algorithms and theoretically prove their validity. We also propose an algorithm to further improve the time and space efficiency of our basic algorithm for discovering significant dense subgraphs in big data by taking advantage of the unique features of large natural graphs. In the experiments, we use massive natural graphs to evaluate our algorithms in comparison with previous algorithms. The experimental results show the effectiveness of our algorithms for the two problems and their efficiency. This work is also the first that reveals the physical significance of significant dense subgraphs in natural graphs from different domains.
An Adiabatic Quantum Algorithm for Determining Gracefulness of a Graph
NASA Astrophysics Data System (ADS)
Hosseini, Sayed Mohammad; Davoudi Darareh, Mahdi; Janbaz, Shahrooz; Zaghian, Ali
2017-07-01
Graph labelling is one of the noticed contexts in combinatorics and graph theory. Graceful labelling for a graph G with e edges, is to label the vertices of G with 0, 1, ℒ, e such that, if we specify to each edge the difference value between its two ends, then any of 1, 2, ℒ, e appears exactly once as an edge label. For a given graph, there are still few efficient classical algorithms that determine either it is graceful or not, even for trees - as a well-known class of graphs. In this paper, we introduce an adiabatic quantum algorithm, which for a graceful graph G finds a graceful labelling. Also, this algorithm can determine if G is not graceful. Numerical simulations of the algorithm reveal that its time complexity has a polynomial behaviour with the problem size up to the range of 15 qubits. A general sufficient condition for a combinatorial optimization problem to have a satisfying adiabatic solution is also derived.
Fast generation of sparse random kernel graphs
Hagberg, Aric; Lemons, Nathan; Du, Wen -Bo
2015-09-10
The development of kernel-based inhomogeneous random graphs has provided models that are flexible enough to capture many observed characteristics of real networks, and that are also mathematically tractable. We specify a class of inhomogeneous random graph models, called random kernel graphs, that produces sparse graphs with tunable graph properties, and we develop an efficient generation algorithm to sample random instances from this model. As real-world networks are usually large, it is essential that the run-time of generation algorithms scales better than quadratically in the number of vertices n. We show that for many practical kernels our algorithm runs in timemore » at most ο(n(logn)²). As an example, we show how to generate samples of power-law degree distribution graphs with tunable assortativity.« less
On a programming language for graph algorithms
NASA Technical Reports Server (NTRS)
Rheinboldt, W. C.; Basili, V. R.; Mesztenyi, C. K.
1971-01-01
An algorithmic language, GRAAL, is presented for describing and implementing graph algorithms of the type primarily arising in applications. The language is based on a set algebraic model of graph theory which defines the graph structure in terms of morphisms between certain set algebraic structures over the node set and arc set. GRAAL is modular in the sense that the user specifies which of these mappings are available with any graph. This allows flexibility in the selection of the storage representation for different graph structures. In line with its set theoretic foundation, the language introduces sets as a basic data type and provides for the efficient execution of all set and graph operators. At present, GRAAL is defined as an extension of ALGOL 60 (revised) and its formal description is given as a supplement to the syntactic and semantic definition of ALGOL. Several typical graph algorithms are written in GRAAL to illustrate various features of the language and to show its applicability.
Approximate Computing Techniques for Iterative Graph Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh
Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
Label-based routing for a family of small-world Farey graphs.
Zhai, Yinhu; Wang, Yinhe
2016-05-11
We introduce an informative labelling method for vertices in a family of Farey graphs, and deduce a routing algorithm on all the shortest paths between any two vertices in Farey graphs. The label of a vertex is composed of the precise locating position in graphs and the exact time linking to graphs. All the shortest paths routing between any pair of vertices, which number is exactly the product of two Fibonacci numbers, are determined only by their labels, and the time complexity of the algorithm is O(n). It is the first algorithm to figure out all the shortest paths between any pair of vertices in a kind of deterministic graphs. For Farey networks, the existence of an efficient routing protocol is of interest to design practical communication algorithms in relation to dynamical processes (including synchronization and structural controllability) and also to understand the underlying mechanisms that have shaped their particular structure.
Label-based routing for a family of small-world Farey graphs
NASA Astrophysics Data System (ADS)
Zhai, Yinhu; Wang, Yinhe
2016-05-01
We introduce an informative labelling method for vertices in a family of Farey graphs, and deduce a routing algorithm on all the shortest paths between any two vertices in Farey graphs. The label of a vertex is composed of the precise locating position in graphs and the exact time linking to graphs. All the shortest paths routing between any pair of vertices, which number is exactly the product of two Fibonacci numbers, are determined only by their labels, and the time complexity of the algorithm is O(n). It is the first algorithm to figure out all the shortest paths between any pair of vertices in a kind of deterministic graphs. For Farey networks, the existence of an efficient routing protocol is of interest to design practical communication algorithms in relation to dynamical processes (including synchronization and structural controllability) and also to understand the underlying mechanisms that have shaped their particular structure.
Nagoor Gani, A; Latha, S R
2016-01-01
A Hamiltonian cycle in a graph is a cycle that visits each node/vertex exactly once. A graph containing a Hamiltonian cycle is called a Hamiltonian graph. There have been several researches to find the number of Hamiltonian cycles of a Hamilton graph. As the number of vertices and edges grow, it becomes very difficult to keep track of all the different ways through which the vertices are connected. Hence, analysis of large graphs can be efficiently done with the assistance of a computer system that interprets graphs as matrices. And, of course, a good and well written algorithm will expedite the analysis even faster. The most convenient way to quickly test whether there is an edge between two vertices is to represent graphs using adjacent matrices. In this paper, a new algorithm is proposed to find fuzzy Hamiltonian cycle using adjacency matrix and the degree of the vertices of a fuzzy graph. A fuzzy graph structure is also modeled to illustrate the proposed algorithms with the selected air network of Indigo airlines.
DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases
NASA Astrophysics Data System (ADS)
Bröcheler, Matthias; Pugliese, Andrea; Subrahmanian, V. S.
RDF is an increasingly important paradigm for the representation of information on the Web. As RDF databases increase in size to approach tens of millions of triples, and as sophisticated graph matching queries expressible in languages like SPARQL become increasingly important, scalability becomes an issue. To date, there is no graph-based indexing method for RDF data where the index was designed in a way that makes it disk-resident. There is therefore a growing need for indexes that can operate efficiently when the index itself resides on disk. In this paper, we first propose the DOGMA index for fast subgraph matching on disk and then develop a basic algorithm to answer queries over this index. This algorithm is then significantly sped up via an optimized algorithm that uses efficient (but correct) pruning strategies when combined with two different extensions of the index. We have implemented a preliminary system and tested it against four existing RDF database systems developed by others. Our experiments show that our algorithm performs very well compared to these systems, with orders of magnitude improvements for complex graph queries.
Novo, Leonardo; Chakraborty, Shantanav; Mohseni, Masoud; Neven, Hartmut; Omar, Yasser
2015-01-01
Continuous time quantum walks provide an important framework for designing new algorithms and modelling quantum transport and state transfer problems. Often, the graph representing the structure of a problem contains certain symmetries that confine the dynamics to a smaller subspace of the full Hilbert space. In this work, we use invariant subspace methods, that can be computed systematically using the Lanczos algorithm, to obtain the reduced set of states that encompass the dynamics of the problem at hand without the specific knowledge of underlying symmetries. First, we apply this method to obtain new instances of graphs where the spatial quantum search algorithm is optimal: complete graphs with broken links and complete bipartite graphs, in particular, the star graph. These examples show that regularity and high-connectivity are not needed to achieve optimal spatial search. We also show that this method considerably simplifies the calculation of quantum transport efficiencies. Furthermore, we observe improved efficiencies by removing a few links from highly symmetric graphs. Finally, we show that this reduction method also allows us to obtain an upper bound for the fidelity of a single qubit transfer on an XY spin network. PMID:26330082
Thread Graphs, Linear Rank-Width and Their Algorithmic Applications
NASA Astrophysics Data System (ADS)
Ganian, Robert
The introduction of tree-width by Robertson and Seymour [7] was a breakthrough in the design of graph algorithms. A lot of research since then has focused on obtaining a width measure which would be more general and still allowed efficient algorithms for a wide range of NP-hard problems on graphs of bounded width. To this end, Oum and Seymour have proposed rank-width, which allows the solution of many such hard problems on a less restricted graph classes (see e.g. [3,4]). But what about problems which are NP-hard even on graphs of bounded tree-width or even on trees? The parameter used most often for these exceptionally hard problems is path-width, however it is extremely restrictive - for example the graphs of path-width 1 are exactly paths.
An algorithm for automatic reduction of complex signal flow graphs
NASA Technical Reports Server (NTRS)
Young, K. R.; Hoberock, L. L.; Thompson, J. G.
1976-01-01
A computer algorithm is developed that provides efficient means to compute transmittances directly from a signal flow graph or a block diagram. Signal flow graphs are cast as directed graphs described by adjacency matrices. Nonsearch computation, designed for compilers without symbolic capability, is used to identify all arcs that are members of simple cycles for use with Mason's gain formula. The routine does not require the visual acumen of an interpreter to reduce the topology of the graph, and it is particularly useful for analyzing control systems described for computer analyses by means of interactive graphics.
GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil
2015-11-15
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less
Planning Assembly Of Large Truss Structures In Outer Space
NASA Technical Reports Server (NTRS)
De Mello, Luiz S. Homem; Desai, Rajiv S.
1992-01-01
Report dicusses developmental algorithm used in systematic planning of sequences of operations in which large truss structures assembled in outer space. Assembly sequence represented by directed graph called "assembly graph", in which each arc represents joining of two parts or subassemblies. Algorithm generates assembly graph, working backward from state of complete assembly to initial state, in which all parts disassembled. Working backward more efficient than working forward because it avoids intermediate dead ends.
A Graph Based Backtracking Algorithm for Solving General CSPs
NASA Technical Reports Server (NTRS)
Pang, Wanlin; Goodwin, Scott D.
2003-01-01
Many AI tasks can be formalized as constraint satisfaction problems (CSPs), which involve finding values for variables subject to constraints. While solving a CSP is an NP-complete task in general, tractable classes of CSPs have been identified based on the structure of the underlying constraint graphs. Much effort has been spent on exploiting structural properties of the constraint graph to improve the efficiency of finding a solution. These efforts contributed to development of a class of CSP solving algorithms called decomposition algorithms. The strength of CSP decomposition is that its worst-case complexity depends on the structural properties of the constraint graph and is usually better than the worst-case complexity of search methods. Its practical application is limited, however, since it cannot be applied if the CSP is not decomposable. In this paper, we propose a graph based backtracking algorithm called omega-CDBT, which shares merits and overcomes the weaknesses of both decomposition and search approaches.
Pan, Yongke; Niu, Wenjia
2017-01-01
Semisupervised Discriminant Analysis (SDA) is a semisupervised dimensionality reduction algorithm, which can easily resolve the out-of-sample problem. Relative works usually focus on the geometric relationships of data points, which are not obvious, to enhance the performance of SDA. Different from these relative works, the regularized graph construction is researched here, which is important in the graph-based semisupervised learning methods. In this paper, we propose a novel graph for Semisupervised Discriminant Analysis, which is called combined low-rank and k-nearest neighbor (LRKNN) graph. In our LRKNN graph, we map the data to the LR feature space and then the kNN is adopted to satisfy the algorithmic requirements of SDA. Since the low-rank representation can capture the global structure and the k-nearest neighbor algorithm can maximally preserve the local geometrical structure of the data, the LRKNN graph can significantly improve the performance of SDA. Extensive experiments on several real-world databases show that the proposed LRKNN graph is an efficient graph constructor, which can largely outperform other commonly used baselines. PMID:28316616
Chaotic Traversal (CHAT): Very Large Graphs Traversal Using Chaotic Dynamics
NASA Astrophysics Data System (ADS)
Changaival, Boonyarit; Rosalie, Martin; Danoy, Grégoire; Lavangnananda, Kittichai; Bouvry, Pascal
2017-12-01
Graph Traversal algorithms can find their applications in various fields such as routing problems, natural language processing or even database querying. The exploration can be considered as a first stepping stone into knowledge extraction from the graph which is now a popular topic. Classical solutions such as Breadth First Search (BFS) and Depth First Search (DFS) require huge amounts of memory for exploring very large graphs. In this research, we present a novel memoryless graph traversal algorithm, Chaotic Traversal (CHAT) which integrates chaotic dynamics to traverse large unknown graphs via the Lozi map and the Rössler system. To compare various dynamics effects on our algorithm, we present an original way to perform the exploration of a parameter space using a bifurcation diagram with respect to the topological structure of attractors. The resulting algorithm is an efficient and nonresource demanding algorithm, and is therefore very suitable for partial traversal of very large and/or unknown environment graphs. CHAT performance using Lozi map is proven superior than the, commonly known, Random Walk, in terms of number of nodes visited (coverage percentage) and computation time where the environment is unknown and memory usage is restricted.
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Agarwal, Kapil; Song, Shuaiwen
2015-09-30
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the hostmore » and the device.« less
NASA Astrophysics Data System (ADS)
Kruglov, V. E.; Malyshev, D. S.; Pochinka, O. V.
2018-01-01
Studying the dynamics of a flow on surfaces by partitioning the phase space into cells with the same limit behaviour of trajectories within a cell goes back to the classical papers of Andronov, Pontryagin, Leontovich and Maier. The types of cells (the number of which is finite) and how the cells adjoin one another completely determine the topological equivalence class of a flow with finitely many special trajectories. If one trajectory is chosen in every cell of a rough flow without periodic orbits, then the cells are partitioned into so-called triangular regions of the same type. A combinatorial description of such a partition gives rise to the three-colour Oshemkov-Sharko graph, the vertices of which correspond to the triangular regions, and the edges to separatrices connecting them. Oshemkov and Sharko proved that such flows are topologically equivalent if and only if the three-colour graphs of the flows are isomorphic, and described an algorithm of distinguishing three-colour graphs. But their algorithm is not efficient with respect to graph theory. In the present paper, we describe the dynamics of Ω-stable flows without periodic trajectories on surfaces in the language of four-colour graphs, present an efficient algorithm for distinguishing such graphs, and develop a realization of a flow from some abstract graph. Bibliography: 17 titles.
Exact and approximate graph matching using random walks.
Gori, Marco; Maggini, Marco; Sarti, Lorenzo
2005-07-01
In this paper, we propose a general framework for graph matching which is suitable for different problems of pattern recognition. The pattern representation we assume is at the same time highly structured, like for classic syntactic and structural approaches, and of subsymbolic nature with real-valued features, like for connectionist and statistic approaches. We show that random walk based models, inspired by Google's PageRank, give rise to a spectral theory that nicely enhances the graph topological features at node level. As a straightforward consequence, we derive a polynomial algorithm for the classic graph isomorphism problem, under the restriction of dealing with Markovian spectrally distinguishable graphs (MSD), a class of graphs that does not seem to be easily reducible to others proposed in the literature. The experimental results that we found on different test-beds of the TC-15 graph database show that the defined MSD class "almost always" covers the database, and that the proposed algorithm is significantly more efficient than top scoring VF algorithm on the same data. Most interestingly, the proposed approach is very well-suited for dealing with partial and approximate graph matching problems, derived for instance from image retrieval tasks. We consider the objects of the COIL-100 visual collection and provide a graph-based representation, whose node's labels contain appropriate visual features. We show that the adoption of classic bipartite graph matching algorithms offers a straightforward generalization of the algorithm given for graph isomorphism and, finally, we report very promising experimental results on the COIL-100 visual collection.
A parallel computing engine for a class of time critical processes.
Nabhan, T M; Zomaya, A Y
1997-01-01
This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
An Optimal CDS Construction Algorithm with Activity Scheduling in Ad Hoc Networks
Penumalli, Chakradhar; Palanichamy, Yogesh
2015-01-01
A new energy efficient optimal Connected Dominating Set (CDS) algorithm with activity scheduling for mobile ad hoc networks (MANETs) is proposed. This algorithm achieves energy efficiency by minimizing the Broadcast Storm Problem [BSP] and at the same time considering the node's remaining energy. The Connected Dominating Set is widely used as a virtual backbone or spine in mobile ad hoc networks [MANETs] or Wireless Sensor Networks [WSN]. The CDS of a graph representing a network has a significant impact on an efficient design of routing protocol in wireless networks. Here the CDS is a distributed algorithm with activity scheduling based on unit disk graph [UDG]. The node's mobility and residual energy (RE) are considered as parameters in the construction of stable optimal energy efficient CDS. The performance is evaluated at various node densities, various transmission ranges, and mobility rates. The theoretical analysis and simulation results of this algorithm are also presented which yield better results. PMID:26221627
Output-Sensitive Construction of Reeb Graphs.
Doraiswamy, H; Natarajan, V
2012-01-01
The Reeb graph of a scalar function represents the evolution of the topology of its level sets. This paper describes a near-optimal output-sensitive algorithm for computing the Reeb graph of scalar functions defined over manifolds or non-manifolds in any dimension. Key to the simplicity and efficiency of the algorithm is an alternate definition of the Reeb graph that considers equivalence classes of level sets instead of individual level sets. The algorithm works in two steps. The first step locates all critical points of the function in the domain. Critical points correspond to nodes in the Reeb graph. Arcs connecting the nodes are computed in the second step by a simple search procedure that works on a small subset of the domain that corresponds to a pair of critical points. The paper also describes a scheme for controlled simplification of the Reeb graph and two different graph layout schemes that help in the effective presentation of Reeb graphs for visual analysis of scalar fields. Finally, the Reeb graph is employed in four different applications-surface segmentation, spatially-aware transfer function design, visualization of interval volumes, and interactive exploration of time-varying data.
Caetano, Tibério S; McAuley, Julian J; Cheng, Li; Le, Quoc V; Smola, Alex J
2009-06-01
As a fundamental problem in pattern recognition, graph matching has applications in a variety of fields, from computer vision to computational biology. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. Many formulations of this problem can be cast in general as a quadratic assignment problem, where a linear term in the objective function encodes node compatibility and a quadratic term encodes edge compatibility. The main research focus in this theme is about designing efficient algorithms for approximately solving the quadratic assignment problem, since it is NP-hard. In this paper we turn our attention to a different question: how to estimate compatibility functions such that the solution of the resulting graph matching problem best matches the expected solution that a human would manually provide. We present a method for learning graph matching: the training examples are pairs of graphs and the 'labels' are matches between them. Our experimental results reveal that learning can substantially improve the performance of standard graph matching algorithms. In particular, we find that simple linear assignment with such a learning scheme outperforms Graduated Assignment with bistochastic normalisation, a state-of-the-art quadratic assignment relaxation algorithm.
BootGraph: probabilistic fiber tractography using bootstrap algorithms and graph theory.
Vorburger, Robert S; Reischauer, Carolin; Boesiger, Peter
2013-02-01
Bootstrap methods have recently been introduced to diffusion-weighted magnetic resonance imaging to estimate the measurement uncertainty of ensuing diffusion parameters directly from the acquired data without the necessity to assume a noise model. These methods have been previously combined with deterministic streamline tractography algorithms to allow for the assessment of connection probabilities in the human brain. Thereby, the local noise induced disturbance in the diffusion data is accumulated additively due to the incremental progression of streamline tractography algorithms. Graph based approaches have been proposed to overcome this drawback of streamline techniques. For this reason, the bootstrap method is in the present work incorporated into a graph setup to derive a new probabilistic fiber tractography method, called BootGraph. The acquired data set is thereby converted into a weighted, undirected graph by defining a vertex in each voxel and edges between adjacent vertices. By means of the cone of uncertainty, which is derived using the wild bootstrap, a weight is thereafter assigned to each edge. Two path finding algorithms are subsequently applied to derive connection probabilities. While the first algorithm is based on the shortest path approach, the second algorithm takes all existing paths between two vertices into consideration. Tracking results are compared to an established algorithm based on the bootstrap method in combination with streamline fiber tractography and to another graph based algorithm. The BootGraph shows a very good performance in crossing situations with respect to false negatives and permits incorporating additional constraints, such as a curvature threshold. By inheriting the advantages of the bootstrap method and graph theory, the BootGraph method provides a computationally efficient and flexible probabilistic tractography setup to compute connection probability maps and virtual fiber pathways without the drawbacks of streamline tractography algorithms or the assumption of a noise distribution. Moreover, the BootGraph can be applied to common DTI data sets without further modifications and shows a high repeatability. Thus, it is very well suited for longitudinal studies and meta-studies based on DTI. Copyright © 2012 Elsevier Inc. All rights reserved.
Dong, Jianwu; Chen, Feng; Zhou, Dong; Liu, Tian; Yu, Zhaofei; Wang, Yi
2017-03-01
Existence of low SNR regions and rapid-phase variations pose challenges to spatial phase unwrapping algorithms. Global optimization-based phase unwrapping methods are widely used, but are significantly slower than greedy methods. In this paper, dual decomposition acceleration is introduced to speed up a three-dimensional graph cut-based phase unwrapping algorithm. The phase unwrapping problem is formulated as a global discrete energy minimization problem, whereas the technique of dual decomposition is used to increase the computational efficiency by splitting the full problem into overlapping subproblems and enforcing the congruence of overlapping variables. Using three dimensional (3D) multiecho gradient echo images from an agarose phantom and five brain hemorrhage patients, we compared this proposed method with an unaccelerated graph cut-based method. Experimental results show up to 18-fold acceleration in computation time. Dual decomposition significantly improves the computational efficiency of 3D graph cut-based phase unwrapping algorithms. Magn Reson Med 77:1353-1358, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Geographic Gossip: Efficient Averaging for Sensor Networks
NASA Astrophysics Data System (ADS)
Dimakis, Alexandros D. G.; Sarwate, Anand D.; Wainwright, Martin J.
Gossip algorithms for distributed computation are attractive due to their simplicity, distributed nature, and robustness in noisy and uncertain environments. However, using standard gossip algorithms can lead to a significant waste in energy by repeatedly recirculating redundant information. For realistic sensor network model topologies like grids and random geometric graphs, the inefficiency of gossip schemes is related to the slow mixing times of random walks on the communication graph. We propose and analyze an alternative gossiping scheme that exploits geographic information. By utilizing geographic routing combined with a simple resampling method, we demonstrate substantial gains over previously proposed gossip protocols. For regular graphs such as the ring or grid, our algorithm improves standard gossip by factors of $n$ and $\\sqrt{n}$ respectively. For the more challenging case of random geometric graphs, our algorithm computes the true average to accuracy $\\epsilon$ using $O(\\frac{n^{1.5}}{\\sqrt{\\log n}} \\log \\epsilon^{-1})$ radio transmissions, which yields a $\\sqrt{\\frac{n}{\\log n}}$ factor improvement over standard gossip algorithms. We illustrate these theoretical results with experimental comparisons between our algorithm and standard methods as applied to various classes of random fields.
Top-k similar graph matching using TraM in biological networks.
Amin, Mohammad Shafkat; Finley, Russell L; Jamil, Hasan M
2012-01-01
Many emerging database applications entail sophisticated graph-based query manipulation, predominantly evident in large-scale scientific applications. To access the information embedded in graphs, efficient graph matching tools and algorithms have become of prime importance. Although the prohibitively expensive time complexity associated with exact subgraph isomorphism techniques has limited its efficacy in the application domain, approximate yet efficient graph matching techniques have received much attention due to their pragmatic applicability. Since public domain databases are noisy and incomplete in nature, inexact graph matching techniques have proven to be more promising in terms of inferring knowledge from numerous structural data repositories. In this paper, we propose a novel technique called TraM for approximate graph matching that off-loads a significant amount of its processing on to the database making the approach viable for large graphs. Moreover, the vector space embedding of the graphs and efficient filtration of the search space enables computation of approximate graph similarity at a throw-away cost. We annotate nodes of the query graphs by means of their global topological properties and compare them with neighborhood biased segments of the datagraph for proper matches. We have conducted experiments on several real data sets, and have demonstrated the effectiveness and efficiency of the proposed method
Memoryless cooperative graph search based on the simulated annealing algorithm
NASA Astrophysics Data System (ADS)
Hou, Jian; Yan, Gang-Feng; Fan, Zhen
2011-04-01
We have studied the problem of reaching a globally optimal segment for a graph-like environment with a single or a group of autonomous mobile agents. Firstly, two efficient simulated-annealing-like algorithms are given for a single agent to solve the problem in a partially known environment and an unknown environment, respectively. It shows that under both proposed control strategies, the agent will eventually converge to a globally optimal segment with probability 1. Secondly, we use multi-agent searching to simultaneously reduce the computation complexity and accelerate convergence based on the algorithms we have given for a single agent. By exploiting graph partition, a gossip-consensus method based scheme is presented to update the key parameter—radius of the graph, ensuring that the agents spend much less time finding a globally optimal segment.
Fat water decomposition using globally optimal surface estimation (GOOSE) algorithm.
Cui, Chen; Wu, Xiaodong; Newell, John D; Jacob, Mathews
2015-03-01
This article focuses on developing a novel noniterative fat water decomposition algorithm more robust to fat water swaps and related ambiguities. Field map estimation is reformulated as a constrained surface estimation problem to exploit the spatial smoothness of the field, thus minimizing the ambiguities in the recovery. Specifically, the differences in the field map-induced frequency shift between adjacent voxels are constrained to be in a finite range. The discretization of the above problem yields a graph optimization scheme, where each node of the graph is only connected with few other nodes. Thanks to the low graph connectivity, the problem is solved efficiently using a noniterative graph cut algorithm. The global minimum of the constrained optimization problem is guaranteed. The performance of the algorithm is compared with that of state-of-the-art schemes. Quantitative comparisons are also made against reference data. The proposed algorithm is observed to yield more robust fat water estimates with fewer fat water swaps and better quantitative results than other state-of-the-art algorithms in a range of challenging applications. The proposed algorithm is capable of considerably reducing the swaps in challenging fat water decomposition problems. The experiments demonstrate the benefit of using explicit smoothness constraints in field map estimation and solving the problem using a globally convergent graph-cut optimization algorithm. © 2014 Wiley Periodicals, Inc.
User-Assisted Store Recycling for Dynamic Task Graph Schedulers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurt, Mehmet Can; Krishnamoorthy, Sriram; Agrawal, Gagan
The emergence of the multi-core era has led to increased interest in designing effective yet practical parallel programming models. Models based on task graphs that operate on single-assignment data are attractive in several ways: they can support dynamic applications and precisely represent the available concurrency. However, they also require nuanced algorithms for scheduling and memory management for efficient execution. In this paper, we consider memory-efficient dynamic scheduling of task graphs. Specifically, we present a novel approach for dynamically recycling the memory locations assigned to data items as they are produced by tasks. We develop algorithms to identify memory-efficient store recyclingmore » functions by systematically evaluating the validity of a set of (user-provided or automatically generated) alternatives. Because recycling function can be input data-dependent, we have also developed support for continued correct execution of a task graph in the presence of a potentially incorrect store recycling function. Experimental evaluation demonstrates that our approach to automatic store recycling incurs little to no overheads, achieves memory usage comparable to the best manually derived solutions, often produces recycling functions valid across problem sizes and input parameters, and efficiently recovers from an incorrect choice of store recycling functions.« less
An asynchronous traversal engine for graph-based rich metadata management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Dong; Carns, Philip; Ross, Robert B.
Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
An asynchronous traversal engine for graph-based rich metadata management
Dai, Dong; Carns, Philip; Ross, Robert B.; ...
2016-06-23
Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
2014-01-01
Background Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge for modern systems biology. Bipartite graphs can be useful for representing relationships across pairs of disparate data types, with the interpretation of these relationships accomplished through an enumeration of maximal bicliques. Most previously-known techniques are generally ill-suited to this foundational task, because they are relatively inefficient and without effective scaling. In this paper, a powerful new algorithm is described that produces all maximal bicliques in a bipartite graph. Unlike most previous approaches, the new method neither places undue restrictions on its input nor inflates the problem size. Efficiency is achieved through an innovative exploitation of bipartite graph structure, and through computational reductions that rapidly eliminate non-maximal candidates from the search space. An iterative selection of vertices for consideration based on non-decreasing common neighborhood sizes boosts efficiency and leads to more balanced recursion trees. Results The new technique is implemented and compared to previously published approaches from graph theory and data mining. Formal time and space bounds are derived. Experiments are performed on both random graphs and graphs constructed from functional genomics data. It is shown that the new method substantially outperforms the best previous alternatives. Conclusions The new method is streamlined, efficient, and particularly well-suited to the study of huge and diverse biological data. A robust implementation has been incorporated into GeneWeaver, an online tool for integrating and analyzing functional genomics experiments, available at http://geneweaver.org. The enormous increase in scalability it provides empowers users to study complex and previously unassailable gene-set associations between genes and their biological functions in a hierarchical fashion and on a genome-wide scale. This practical computational resource is adaptable to almost any applications environment in which bipartite graphs can be used to model relationships between pairs of heterogeneous entities. PMID:24731198
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, X; Belcher, AH; Wiersma, R
Purpose: In radiation therapy optimization the constraints can be either hard constraints which must be satisfied or soft constraints which are included but do not need to be satisfied exactly. Currently the voxel dose constraints are viewed as soft constraints and included as a part of the objective function and approximated as an unconstrained problem. However in some treatment planning cases the constraints should be specified as hard constraints and solved by constrained optimization. The goal of this work is to present a computation efficiency graph form alternating direction method of multipliers (ADMM) algorithm for constrained quadratic treatment planning optimizationmore » and compare it with several commonly used algorithms/toolbox. Method: ADMM can be viewed as an attempt to blend the benefits of dual decomposition and augmented Lagrangian methods for constrained optimization. Various proximal operators were first constructed as applicable to quadratic IMRT constrained optimization and the problem was formulated in a graph form of ADMM. A pre-iteration operation for the projection of a point to a graph was also proposed to further accelerate the computation. Result: The graph form ADMM algorithm was tested by the Common Optimization for Radiation Therapy (CORT) dataset including TG119, prostate, liver, and head & neck cases. Both unconstrained and constrained optimization problems were formulated for comparison purposes. All optimizations were solved by LBFGS, IPOPT, Matlab built-in toolbox, CVX (implementing SeDuMi) and Mosek solvers. For unconstrained optimization, it was found that LBFGS performs the best, and it was 3–5 times faster than graph form ADMM. However, for constrained optimization, graph form ADMM was 8 – 100 times faster than the other solvers. Conclusion: A graph form ADMM can be applied to constrained quadratic IMRT optimization. It is more computationally efficient than several other commercial and noncommercial optimizers and it also used significantly less computer memory.« less
Maximal clique enumeration with data-parallel primitives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lessley, Brenton; Perciano, Talita; Mathai, Manish
The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis
2015-01-01
ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
Topology for efficient information dissemination in ad-hoc networking
NASA Technical Reports Server (NTRS)
Jennings, E.; Okino, C. M.
2002-01-01
In this paper, we explore the information dissemination problem in ad-hoc wirless networks. First, we analyze the probability of successful broadcast, assuming: the nodes are uniformly distributed, the available area has a lower bould relative to the total number of nodes, and there is zero knowledge of the overall topology of the network. By showing that the probability of such events is small, we are motivated to extract good graph topologies to minimize the overall transmissions. Three algorithms are used to generate topologies of the network with guaranteed connectivity. These are the minimum radius graph, the relative neighborhood graph and the minimum spanning tree. Our simulation shows that the relative neighborhood graph has certain good graph properties, which makes it suitable for efficient information dissemination.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Dynamic airspace configuration algorithms for next generation air transportation system
NASA Astrophysics Data System (ADS)
Wei, Jian
The National Airspace System (NAS) is under great pressure to safely and efficiently handle the record-high air traffic volume nowadays, and will face even greater challenge to keep pace with the steady increase of future air travel demand, since the air travel demand is projected to increase to two to three times the current level by 2025. The inefficiency of traffic flow management initiatives causes severe airspace congestion and frequent flight delays, which cost billions of economic losses every year. To address the increasingly severe airspace congestion and delays, the Next Generation Air Transportation System (NextGen) is proposed to transform the current static and rigid radar based system to a dynamic and flexible satellite based system. New operational concepts such as Dynamic Airspace Configuration (DAC) have been under development to allow more flexibility required to mitigate the demand-capacity imbalances in order to increase the throughput of the entire NAS. In this dissertation, we address the DAC problem in the en route and terminal airspace under the framework of NextGen. We develop a series of algorithms to facilitate the implementation of innovative concepts relevant with DAC in both the en route and terminal airspace. We also develop a performance evaluation framework for comprehensive benefit analyses on different aspects of future sector design algorithms. First, we complete a graph based sectorization algorithm for DAC in the en route airspace, which models the underlying air route network with a weighted graph, converts the sectorization problem into the graph partition problem, partitions the weighted graph with an iterative spectral bipartition method, and constructs the sectors from the partitioned graph. The algorithm uses a graph model to accurately capture the complex traffic patterns of the real flights, and generates sectors with high efficiency while evenly distributing the workload among the generated sectors. We further improve the robustness and efficiency of the graph based DAC algorithm by incorporating the Multilevel Graph Partitioning (MGP) method into the graph model, and develop a MGP based sectorization algorithm for DAC in the en route airspace. In a comprehensive benefit analysis, the performance of the proposed algorithms are tested in numerical simulations with Enhanced Traffic Management System (ETMS) data. Simulation results demonstrate that the algorithmically generated sectorizations outperform the current sectorizations in different sectors for different time periods. Secondly, based on our experience with DAC in the en route airspace, we further study the sectorization problem for DAC in the terminal airspace. The differences between the en route and terminal airspace are identified, and their influence on the terminal sectorization is analyzed. After adjusting the graph model to better capture the unique characteristics of the terminal airspace and the requirements of terminal sectorization, we develop a graph based geometric sectorization algorithm for DAC in the terminal airspace. Moreover, the graph based model is combined with the region based sector design method to better handle the complicated geometric and operational constraints in the terminal sectorization problem. In the benefit analysis, we identify the contributing factors to terminal controller workload, define evaluation metrics, and develop a bebefit analysis framework for terminal sectorization evaluation. With the evaluation framework developed, we demonstrate the improvements on the current sectorizations with real traffic data collected from several major international airports in the U.S., and conduct a detailed analysis on the potential benefits of dynamic reconfiguration in the terminal airspace. Finally, in addition to the research on the macroscopic behavior of a large number of aircraft, we also study the dynamical behavior of individual aircraft from the perspective of traffic flow management. We formulate the mode-confusion problem as hybrid estimation problem, and develop a state estimation algorithm for the linear hybrid system with continuous-state-dependent transitions based on sparse observations. We also develop an estimated time of arrival prediction algorithm based on the state-dependent transition hybrid estimation algorithm, whose performance is demonstrated with simulations on the landing procedure following the Continuous Descend Approach (CDA) profile.
Automatic classification of protein structures relying on similarities between alignments
2012-01-01
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. PMID:22974051
Efficient Algorithmic Frameworks via Structural Graph Theory
2016-10-28
centrally planned solution. Policy recommendation: Given a socioeconomic game among multiple parties (countries, armies, political parties, terrorist...etc.). 2 Graph Structure of Network Creation Games We completed the final versions of two of our papers about the graph structure inherent in...network creation games ”, which appeared in the following venues: Erik D. Demaine, MohammadTaghi Hajiaghayi, Hamid Mahini, and Morteza Zadi- moghaddam, “The
A novel adaptive Cuckoo search for optimal query plan generation.
Gomathi, Ramalingam; Sharmila, Dhandapani
2014-01-01
The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
Interference graph-based dynamic frequency reuse in optical attocell networks
NASA Astrophysics Data System (ADS)
Liu, Huanlin; Xia, Peijie; Chen, Yong; Wu, Lan
2017-11-01
Indoor optical attocell network may achieve higher capacity than radio frequency (RF) or Infrared (IR)-based wireless systems. It is proposed as a special type of visible light communication (VLC) system using Light Emitting Diodes (LEDs). However, the system spectral efficiency may be severely degraded owing to the inter-cell interference (ICI), particularly for dense deployment scenarios. To address these issues, we construct the spectral interference graph for indoor optical attocell network, and propose the Dynamic Frequency Reuse (DFR) and Weighted Dynamic Frequency Reuse (W-DFR) algorithms to decrease ICI and improve the spectral efficiency performance. The interference graph makes LEDs can transmit data without interference and select the minimum sub-bands needed for frequency reuse. Then, DFR algorithm reuses the system frequency equally across service-providing cells to mitigate spectrum interference. While W-DFR algorithm can reuse the system frequency by using the bandwidth weight (BW), which is defined based on the number of service users. Numerical results show that both of the proposed schemes can effectively improve the average spectral efficiency (ASE) of the system. Additionally, improvement of the user data rate is also obtained by analyzing its cumulative distribution function (CDF).
Searching social networks for subgraph patterns
NASA Astrophysics Data System (ADS)
Ogaard, Kirk; Kase, Sue; Roy, Heather; Nagi, Rakesh; Sambhoos, Kedar; Sudit, Moises
2013-06-01
Software tools for Social Network Analysis (SNA) are being developed which support various types of analysis of social networks extracted from social media websites (e.g., Twitter). Once extracted and stored in a database such social networks are amenable to analysis by SNA software. This data analysis often involves searching for occurrences of various subgraph patterns (i.e., graphical representations of entities and relationships). The authors have developed the Graph Matching Toolkit (GMT) which provides an intuitive Graphical User Interface (GUI) for a heuristic graph matching algorithm called the Truncated Search Tree (TruST) algorithm. GMT is a visual interface for graph matching algorithms processing large social networks. GMT enables an analyst to draw a subgraph pattern by using a mouse to select categories and labels for nodes and links from drop-down menus. GMT then executes the TruST algorithm to find the top five occurrences of the subgraph pattern within the social network stored in the database. GMT was tested using a simulated counter-insurgency dataset consisting of cellular phone communications within a populated area of operations in Iraq. The results indicated GMT (when executing the TruST graph matching algorithm) is a time-efficient approach to searching large social networks. GMT's visual interface to a graph matching algorithm enables intelligence analysts to quickly analyze and summarize the large amounts of data necessary to produce actionable intelligence.
Multi-scale graph-cut algorithm for efficient water-fat separation.
Berglund, Johan; Skorpil, Mikael
2017-09-01
To improve the accuracy and robustness to noise in water-fat separation by unifying the multiscale and graph cut based approaches to B 0 -correction. A previously proposed water-fat separation algorithm that corrects for B 0 field inhomogeneity in 3D by a single quadratic pseudo-Boolean optimization (QPBO) graph cut was incorporated into a multi-scale framework, where field map solutions are propagated from coarse to fine scales for voxels that are not resolved by the graph cut. The accuracy of the single-scale and multi-scale QPBO algorithms was evaluated against benchmark reference datasets. The robustness to noise was evaluated by adding noise to the input data prior to water-fat separation. Both algorithms achieved the highest accuracy when compared with seven previously published methods, while computation times were acceptable for implementation in clinical routine. The multi-scale algorithm was more robust to noise than the single-scale algorithm, while causing only a small increase (+10%) of the reconstruction time. The proposed 3D multi-scale QPBO algorithm offers accurate water-fat separation, robustness to noise, and fast reconstruction. The software implementation is freely available to the research community. Magn Reson Med 78:941-949, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
A Constraint-Based Planner for Data Production
NASA Technical Reports Server (NTRS)
Pang, Wanlin; Golden, Keith
2005-01-01
This paper presents a graph-based backtracking algorithm designed to support constrain-tbased planning in data production domains. This algorithm performs backtracking at two nested levels: the outer- backtracking following the structure of the planning graph to select planner subgoals and actions to achieve them and the inner-backtracking inside a subproblem associated with a selected action to find action parameter values. We show this algorithm works well in a planner applied to automating data production in an ecological forecasting system. We also discuss how the idea of multi-level backtracking may improve efficiency of solving semi-structured constraint problems.
Projected power iteration for network alignment
NASA Astrophysics Data System (ADS)
Onaran, Efe; Villar, Soledad
2017-08-01
The network alignment problem asks for the best correspondence between two given graphs, so that the largest possible number of edges are matched. This problem appears in many scientific problems (like the study of protein-protein interactions) and it is very closely related to the quadratic assignment problem which has graph isomorphism, traveling salesman and minimum bisection problems as particular cases. The graph matching problem is NP-hard in general. However, under some restrictive models for the graphs, algorithms can approximate the alignment efficiently. In that spirit the recent work by Feizi and collaborators introduce EigenAlign, a fast spectral method with convergence guarantees for Erd-s-Renyí graphs. In this work we propose the algorithm Projected Power Alignment, which is a projected power iteration version of EigenAlign. We numerically show it improves the recovery rates of EigenAlign and we describe the theory that may be used to provide performance guarantees for Projected Power Alignment.
An Efficient Algorithm for Partitioning and Authenticating Problem-Solutions of eLeaming Contents
ERIC Educational Resources Information Center
Dewan, Jahangir; Chowdhury, Morshed; Batten, Lynn
2013-01-01
Content authenticity and correctness is one of the important challenges in eLearning as there can be many solutions to one specific problem in cyber space. Therefore, the authors feel it is necessary to map problems to solutions using graph partition and weighted bipartite matching. This article proposes an efficient algorithm to partition…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rosmanis, Ansis
2011-02-15
I introduce a continuous-time quantum walk on graphs called the quantum snake walk, the basis states of which are fixed-length paths (snakes) in the underlying graph. First, I analyze the quantum snake walk on the line, and I show that, even though most states stay localized throughout the evolution, there are specific states that most likely move on the line as wave packets with momentum inversely proportional to the length of the snake. Next, I discuss how an algorithm based on the quantum snake walk might potentially be able to solve an extended version of the glued trees problem, whichmore » asks to find a path connecting both roots of the glued trees graph. To the best of my knowledge, no efficient quantum algorithm solving this problem is known yet.« less
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
AND/OR graph representation of assembly plans
NASA Astrophysics Data System (ADS)
Homem de Mello, Luiz S.; Sanderson, Arthur C.
1990-04-01
A compact representation of all possible assembly plans of a product using AND/OR graphs is presented as a basis for efficient planning algorithms that allow an intelligent robot to pick a course of action according to instantaneous conditions. The AND/OR graph is equivalent to a state transition graph but requires fewer nodes and simplifies the search for feasible plans. Three applications are discussed: (1) the preselection of the best assembly plan, (2) the recovery from execution errors, and (3) the opportunistic scheduling of tasks. An example of an assembly with four parts illustrates the use of the AND/OR graph representation in assembly-plan preselection, based on the weighting of operations according to complexity of manipulation and stability of subassemblies. A hypothetical error situation is discussed to show how a bottom-up search of the AND/OR graph leads to an efficient recovery.
AND/OR graph representation of assembly plans
NASA Technical Reports Server (NTRS)
Homem De Mello, Luiz S.; Sanderson, Arthur C.
1990-01-01
A compact representation of all possible assembly plans of a product using AND/OR graphs is presented as a basis for efficient planning algorithms that allow an intelligent robot to pick a course of action according to instantaneous conditions. The AND/OR graph is equivalent to a state transition graph but requires fewer nodes and simplifies the search for feasible plans. Three applications are discussed: (1) the preselection of the best assembly plan, (2) the recovery from execution errors, and (3) the opportunistic scheduling of tasks. An example of an assembly with four parts illustrates the use of the AND/OR graph representation in assembly-plan preselection, based on the weighting of operations according to complexity of manipulation and stability of subassemblies. A hypothetical error situation is discussed to show how a bottom-up search of the AND/OR graph leads to an efficient recovery.
[A retrieval method of drug molecules based on graph collapsing].
Qu, J W; Lv, X Q; Liu, Z M; Liao, Y; Sun, P H; Wang, B; Tang, Z
2018-04-18
To establish a compact and efficient hypergraph representation and a graph-similarity-based retrieval method of molecules to achieve effective and efficient medicine information retrieval. Chemical structural formula (CSF) was a primary search target as a unique and precise identifier for each compound at the molecular level in the research field of medicine information retrieval. To retrieve medicine information effectively and efficiently, a complete workflow of the graph-based CSF retrieval system was introduced. This system accepted the photos taken from smartphones and the sketches drawn on tablet personal computers as CSF inputs, and formalized the CSFs with the corresponding graphs. Then this paper proposed a compact and efficient hypergraph representation for molecules on the basis of analyzing factors that directly affected the efficiency of graph matching. According to the characteristics of CSFs, a hierarchical collapsing method combining graph isomorphism and frequent subgraph mining was adopted. There was yet a fundamental challenge, subgraph overlapping during the collapsing procedure, which hindered the method from establishing the correct compact hypergraph of an original CSF graph. Therefore, a graph-isomorphism-based algorithm was proposed to select dominant acyclic subgraphs on the basis of overlapping analysis. Finally, the spatial similarity among graphical CSFs was evaluated by multi-dimensional measures of similarity. To evaluate the performance of the proposed method, the proposed system was firstly compared with Wikipedia Chemical Structure Explorer (WCSE), the state-of-the-art system that allowed CSF similarity searching within Wikipedia molecules dataset, on retrieval accuracy. The system achieved higher values on mean average precision, discounted cumulative gain, rank-biased precision, and expected reciprocal rank than WCSE from the top-2 to the top-10 retrieved results. Specifically, the system achieved 10%, 1.41, 6.42%, and 1.32% higher than WCSE on these metrics for top-10 retrieval results, respectively. Moreover, several retrieval cases were presented to intuitively compare with WCSE. The results of the above comparative study demonstrated that the proposed method outperformed the existing method with regard to accuracy and effectiveness. This paper proposes a graph-similarity-based retrieval approach for medicine information. To obtain satisfactory retrieval results, an isomorphism-based algorithm is proposed for dominant subgraph selection based on the subgraph overlapping analysis, as well as an effective and efficient hypergraph representation of molecules. Experiment results demonstrate the effectiveness of the proposed approach.
Efficient Extraction of High Centrality Vertices in Distributed Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumbhare, Alok; Frincu, Marc; Raghavendra, Cauligi S.
2014-09-09
Betweenness centrality (BC) is an important measure for identifying high value or critical vertices in graphs, in variety of domains such as communication networks, road networks, and social graphs. However, calculating betweenness values is prohibitively expensive and, more often, domain experts are interested only in the vertices with the highest centrality values. In this paper, we first propose a partition-centric algorithm (MS-BC) to calculate BC for a large distributed graph that optimizes resource utilization and improves overall performance. Further, we extend the notion of approximate BC by pruning the graph and removing a subset of edges and vertices that contributemore » the least to the betweenness values of other vertices (MSL-BC), which further improves the runtime performance. We evaluate the proposed algorithms using a mix of real-world and synthetic graphs on an HPC cluster and analyze its strengths and weaknesses. The experimental results show an improvement in performance of upto 12x for large sparse graphs as compared to the state-of-the-art, and at the same time highlights the need for better partitioning methods to enable a balanced workload across partitions for unbalanced graphs such as small-world or power-law graphs.« less
Approximate ground states of the random-field Potts model from graph cuts
NASA Astrophysics Data System (ADS)
Kumar, Manoj; Kumar, Ravinder; Weigel, Martin; Banerjee, Varsha; Janke, Wolfhard; Puri, Sanjay
2018-05-01
While the ground-state problem for the random-field Ising model is polynomial, and can be solved using a number of well-known algorithms for maximum flow or graph cut, the analog random-field Potts model corresponds to a multiterminal flow problem that is known to be NP-hard. Hence an efficient exact algorithm is very unlikely to exist. As we show here, it is nevertheless possible to use an embedding of binary degrees of freedom into the Potts spins in combination with graph-cut methods to solve the corresponding ground-state problem approximately in polynomial time. We benchmark this heuristic algorithm using a set of quasiexact ground states found for small systems from long parallel tempering runs. For a not-too-large number q of Potts states, the method based on graph cuts finds the same solutions in a fraction of the time. We employ the new technique to analyze the breakup length of the random-field Potts model in two dimensions.
Efficient Approximation Algorithms for Weighted $b$-Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khan, Arif; Pothen, Alex; Mostofa Ali Patwary, Md.
2016-01-01
We describe a half-approximation algorithm, b-Suitor, for computing a b-Matching of maximum weight in a graph with weights on the edges. b-Matching is a generalization of the well-known Matching problem in graphs, where the objective is to choose a subset of M edges in the graph such that at most a specified number b(v) of edges in M are incident on each vertex v. Subject to this restriction we maximize the sum of the weights of the edges in M. We prove that the b-Suitor algorithm computes the same b-Matching as the one obtained by the greedy algorithm for themore » problem. We implement the algorithm on serial and shared-memory parallel processors, and compare its performance against a collection of approximation algorithms that have been proposed for the Matching problem. Our results show that the b-Suitor algorithm outperforms the Greedy and Locally Dominant edge algorithms by one to two orders of magnitude on a serial processor. The b-Suitor algorithm has a high degree of concurrency, and it scales well up to 240 threads on a shared memory multiprocessor. The b-Suitor algorithm outperforms the Locally Dominant edge algorithm by a factor of fourteen on 16 cores of an Intel Xeon multiprocessor.« less
Efficient structure from motion for oblique UAV images based on maximal spanning tree expansion
NASA Astrophysics Data System (ADS)
Jiang, San; Jiang, Wanshou
2017-10-01
The primary contribution of this paper is an efficient Structure from Motion (SfM) solution for oblique unmanned aerial vehicle (UAV) images. First, an algorithm, considering spatial relationship constraints between image footprints, is designed for match pair selection with the assistance of UAV flight control data and oblique camera mounting angles. Second, a topological connection network (TCN), represented by an undirected weighted graph, is constructed from initial match pairs, which encodes the overlap areas and intersection angles into edge weights. Then, an algorithm, termed MST-Expansion, is proposed to extract the match graph from the TCN, where the TCN is first simplified by a maximum spanning tree (MST). By further analysis of the local structure in the MST, expansion operations are performed on the vertices of the MST for match graph enhancement, which is achieved by introducing critical connections in the expansion directions. Finally, guided by the match graph, an efficient SfM is proposed. Under extensive analysis and comparison, its performance is verified by using three oblique UAV datasets captured with different multi-camera systems. Experimental results demonstrate that the efficiency of image matching is improved, with speedup ratios ranging from 19 to 35, and competitive orientation accuracy is achieved from both relative bundle adjustment (BA) without GCPs (Ground Control Points) and absolute BA with GCPs. At the same time, images in the three datasets are successfully oriented. For the orientation of oblique UAV images, the proposed method can be a more efficient solution.
An algorithm for finding a similar subgraph of all Hamiltonian cycles
NASA Astrophysics Data System (ADS)
Wafdan, R.; Ihsan, M.; Suhaimi, D.
2018-01-01
This paper discusses an algorithm to find a similar subgraph called findSimSubG algorithm. A similar subgraph is a subgraph with a maximum number of edges, contains no isolated vertex and is contained in every Hamiltonian cycle of a Hamiltonian Graph. The algorithm runs only on Hamiltonian graphs with at least two Hamiltonian cycles. The algorithm works by examining whether the initial subgraph of the first Hamiltonian cycle is a subgraph of comparison graphs. If the initial subgraph is not in comparison graphs, the algorithm will remove edges and vertices of the initial subgraph that are not in comparison graphs. There are two main processes in the algorithm, changing Hamiltonian cycle into a cycle graph and removing edges and vertices of the initial subgraph that are not in comparison graphs. The findSimSubG algorithm can find the similar subgraph without using backtracking method. The similar subgraph cannot be found on certain graphs, such as an n-antiprism graph, complete bipartite graph, complete graph, 2n-crossed prism graph, n-crown graph, n-möbius ladder, prism graph, and wheel graph. The complexity of this algorithm is O(m|V|), where m is the number of Hamiltonian cycles and |V| is the number of vertices of a Hamiltonian graph.
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.
Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong
2015-12-01
Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
Graph traversals, genes, and matroids: An efficient case of the travelling salesman problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gusfield, D.; Stelling, P.; Wang, Lusheng
1996-12-31
In this paper the authors consider graph traversal problems that arise from a particular technology for DNA sequencing - sequencing by hybridization (SBH). They first explain the connection of the graph problems to SBH and then focus on the traversal problems. They describe a practical polynomial time solution to the Travelling Salesman Problem in a rich class of directed graphs (including edge weighted binary de Bruijn graphs), and provide a bounded-error approximation algorithm for the maximum weight TSP in a superset of those directed graphs. The authors also establish the existence of a matroid structure defined on the set ofmore » Euler and Hamilton paths in the restricted class of graphs. 8 refs., 5 figs.« less
LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly.
Bonizzoni, Paola; Vedova, Gianluca Della; Pirola, Yuri; Previtali, Marco; Rizzi, Raffaella
2016-03-01
The large amount of short read data that has to be assembled in future applications, such as in metagenomics or cancer genomics, strongly motivates the investigation of disk-based approaches to index next-generation sequencing (NGS) data. Positive results in this direction stimulate the investigation of efficient external memory algorithms for de novo assembly from NGS data. Our article is also motivated by the open problem of designing a space-efficient algorithm to compute a string graph using an indexing procedure based on the Burrows-Wheeler transform (BWT). We have developed a disk-based algorithm for computing string graphs in external memory: the light string graph (LSG). LSG relies on a new representation of the FM-index that is exploited to use an amount of main memory requirement that is independent from the size of the data set. Moreover, we have developed a pipeline for genome assembly from NGS data that integrates LSG with the assembly step of SGA (Simpson and Durbin, 2012 ), a state-of-the-art string graph-based assembler, and uses BEETL for indexing the input data. LSG is open source software and is available online. We have analyzed our implementation on a 875-million read whole-genome dataset, on which LSG has built the string graph using only 1GB of main memory (reducing the memory occupation by a factor of 50 with respect to SGA), while requiring slightly more than twice the time than SGA. The analysis of the entire pipeline shows an important decrease in memory usage, while managing to have only a moderate increase in the running time.
Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.
Jin, Ick Hoon; Yuan, Ying; Liang, Faming
2013-10-01
Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
Incremental isometric embedding of high-dimensional data using connected neighborhood graphs.
Zhao, Dongfang; Yang, Li
2009-01-01
Most nonlinear data embedding methods use bottom-up approaches for capturing the underlying structure of data distributed on a manifold in high dimensional space. These methods often share the first step which defines neighbor points of every data point by building a connected neighborhood graph so that all data points can be embedded to a single coordinate system. These methods are required to work incrementally for dimensionality reduction in many applications. Because input data stream may be under-sampled or skewed from time to time, building connected neighborhood graph is crucial to the success of incremental data embedding using these methods. This paper presents algorithms for updating $k$-edge-connected and $k$-connected neighborhood graphs after a new data point is added or an old data point is deleted. It further utilizes a simple algorithm for updating all-pair shortest distances on the neighborhood graph. Together with incremental classical multidimensional scaling using iterative subspace approximation, this paper devises an incremental version of Isomap with enhancements to deal with under-sampled or unevenly distributed data. Experiments on both synthetic and real-world data sets show that the algorithm is efficient and maintains low dimensional configurations of high dimensional data under various data distributions.
NASA Astrophysics Data System (ADS)
Zhou, Lifan; Chai, Dengfeng; Xia, Yu; Ma, Peifeng; Lin, Hui
2018-01-01
Phase unwrapping (PU) is one of the key processes in reconstructing the digital elevation model of a scene from its interferometric synthetic aperture radar (InSAR) data. It is known that two-dimensional (2-D) PU problems can be formulated as maximum a posteriori estimation of Markov random fields (MRFs). However, considering that the traditional MRF algorithm is usually defined on a rectangular grid, it fails easily if large parts of the wrapped data are dominated by noise caused by large low-coherence area or rapid-topography variation. A PU solution based on sparse MRF is presented to extend the traditional MRF algorithm to deal with sparse data, which allows the unwrapping of InSAR data dominated by high phase noise. To speed up the graph cuts algorithm for sparse MRF, we designed dual elementary graphs and merged them to obtain the Delaunay triangle graph, which is used to minimize the energy function efficiently. The experiments on simulated and real data, compared with other existing algorithms, both confirm the effectiveness of the proposed MRF approach, which suffers less from decorrelation effects caused by large low-coherence area or rapid-topography variation.
Compacting de Bruijn graphs from sequencing data quickly and in low memory.
Chikhi, Rayan; Limasset, Antoine; Medvedev, Paul
2016-06-15
As the quantity of data per sequencing experiment increases, the challenges of fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used data structure in fragment assembly algorithms, used to represent the information from a set of reads. Compaction is an important data reduction step in most de Bruijn graph based algorithms where long simple paths are compacted into single vertices. Compaction has recently become the bottleneck in assembly pipelines, and improving its running time and memory usage is an important problem. We present an algorithm and a tool bcalm 2 for the compaction of de Bruijn graphs. bcalm 2 is a parallel algorithm that distributes the input based on a minimizer hashing technique, allowing for good balance of memory usage throughout its execution. For human sequencing data, bcalm 2 reduces the computational burden of compacting the de Bruijn graph to roughly an hour and 3 GB of memory. We also applied bcalm 2 to the 22 Gbp loblolly pine and 20 Gbp white spruce sequencing datasets. Compacted graphs were constructed from raw reads in less than 2 days and 40 GB of memory on a single machine. Hence, bcalm 2 is at least an order of magnitude more efficient than other available methods. Source code of bcalm 2 is freely available at: https://github.com/GATB/bcalm rayan.chikhi@univ-lille1.fr. © The Author 2016. Published by Oxford University Press.
Spectral identification of topological domains
Chen, Jie; Hero, Alfred O.; Rajapakse, Indika
2016-01-01
Motivation: Topological domains have been proposed as the backbone of interphase chromosome structure. They are regions of high local contact frequency separated by sharp boundaries. Genes within a domain often have correlated transcription. In this paper, we present a computational efficient spectral algorithm to identify topological domains from chromosome conformation data (Hi-C data). We consider the genome as a weighted graph with vertices defined by loci on a chromosome and the edge weights given by interaction frequency between two loci. Laplacian-based graph segmentation is then applied iteratively to obtain the domains at the given compactness level. Comparison with algorithms in the literature shows the advantage of the proposed strategy. Results: An efficient algorithm is presented to identify topological domains from the Hi-C matrix. Availability and Implementation: The Matlab source code and illustrative examples are available at http://bionetworks.ccmb.med.umich.edu/ Contact: indikar@med.umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153657
Identifying Vulnerabilities and Hardening Attack Graphs for Networked Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saha, Sudip; Vullinati, Anil K.; Halappanavar, Mahantesh
We investigate efficient security control methods for protecting against vulnerabilities in networked systems. A large number of interdependent vulnerabilities typically exist in the computing nodes of a cyber-system; as vulnerabilities get exploited, starting from low level ones, they open up the doors to more critical vulnerabilities. These cannot be understood just by a topological analysis of the network, and we use the attack graph abstraction of Dewri et al. to study these problems. In contrast to earlier approaches based on heuristics and evolutionary algorithms, we study rigorous methods for quantifying the inherent vulnerability and hardening cost for the system. Wemore » develop algorithms with provable approximation guarantees, and evaluate them for real and synthetic attack graphs.« less
Graph rigidity, cyclic belief propagation, and point pattern matching.
McAuley, Julian J; Caetano, Tibério S; Barbosa, Marconi S
2008-11-01
A recent paper [1] proposed a provably optimal polynomial time method for performing near-isometric point pattern matching by means of exact probabilistic inference in a chordal graphical model. Its fundamental result is that the chordal graph in question is shown to be globally rigid, implying that exact inference provides the same matching solution as exact inference in a complete graphical model. This implies that the algorithm is optimal when there is no noise in the point patterns. In this paper, we present a new graph that is also globally rigid but has an advantage over the graph proposed in [1]: Its maximal clique size is smaller, rendering inference significantly more efficient. However, this graph is not chordal, and thus, standard Junction Tree algorithms cannot be directly applied. Nevertheless, we show that loopy belief propagation in such a graph converges to the optimal solution. This allows us to retain the optimality guarantee in the noiseless case, while substantially reducing both memory requirements and processing time. Our experimental results show that the accuracy of the proposed solution is indistinguishable from that in [1] when there is noise in the point patterns.
Shi, Longxiang; Li, Shijian; Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin
2017-01-01
With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective.
Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin
2017-01-01
With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective. PMID:28299322
Martín H., José Antonio
2013-01-01
Many practical problems in almost all scientific and technological disciplines have been classified as computationally hard (NP-hard or even NP-complete). In life sciences, combinatorial optimization problems frequently arise in molecular biology, e.g., genome sequencing; global alignment of multiple genomes; identifying siblings or discovery of dysregulated pathways. In almost all of these problems, there is the need for proving a hypothesis about certain property of an object that can be present if and only if it adopts some particular admissible structure (an NP-certificate) or be absent (no admissible structure), however, none of the standard approaches can discard the hypothesis when no solution can be found, since none can provide a proof that there is no admissible structure. This article presents an algorithm that introduces a novel type of solution method to “efficiently” solve the graph 3-coloring problem; an NP-complete problem. The proposed method provides certificates (proofs) in both cases: present or absent, so it is possible to accept or reject the hypothesis on the basis of a rigorous proof. It provides exact solutions and is polynomial-time (i.e., efficient) however parametric. The only requirement is sufficient computational power, which is controlled by the parameter . Nevertheless, here it is proved that the probability of requiring a value of to obtain a solution for a random graph decreases exponentially: , making tractable almost all problem instances. Thorough experimental analyses were performed. The algorithm was tested on random graphs, planar graphs and 4-regular planar graphs. The obtained experimental results are in accordance with the theoretical expected results. PMID:23349711
An efficient randomized algorithm for contact-based NMR backbone resonance assignment.
Kamisetty, Hetunandan; Bailey-Kellogg, Chris; Pandurangan, Gopal
2006-01-15
Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise. This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions. Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.
An Efficient Downlink Scheduling Strategy Using Normal Graphs for Multiuser MIMO Wireless Systems
NASA Astrophysics Data System (ADS)
Chen, Jung-Chieh; Wu, Cheng-Hsuan; Lee, Yao-Nan; Wen, Chao-Kai
Inspired by the success of the low-density parity-check (LDPC) codes in the field of error-control coding, in this paper we propose transforming the downlink multiuser multiple-input multiple-output scheduling problem into an LDPC-like problem using the normal graph. Based on the normal graph framework, soft information, which indicates the probability that each user will be scheduled to transmit packets at the access point through a specified angle-frequency sub-channel, is exchanged among the local processors to iteratively optimize the multiuser transmission schedule. Computer simulations show that the proposed algorithm can efficiently schedule simultaneous multiuser transmission which then increases the overall channel utilization and reduces the average packet delay.
Graph cuts for curvature based image denoising.
Bae, Egil; Shi, Juan; Tai, Xue-Cheng
2011-05-01
Minimization of total variation (TV) is a well-known method for image denoising. Recently, the relationship between TV minimization problems and binary MRF models has been much explored. This has resulted in some very efficient combinatorial optimization algorithms for the TV minimization problem in the discrete setting via graph cuts. To overcome limitations, such as staircasing effects, of the relatively simple TV model, variational models based upon higher order derivatives have been proposed. The Euler's elastica model is one such higher order model of central importance, which minimizes the curvature of all level lines in the image. Traditional numerical methods for minimizing the energy in such higher order models are complicated and computationally complex. In this paper, we will present an efficient minimization algorithm based upon graph cuts for minimizing the energy in the Euler's elastica model, by simplifying the problem to that of solving a sequence of easy graph representable problems. This sequence has connections to the gradient flow of the energy function, and converges to a minimum point. The numerical experiments show that our new approach is more effective in maintaining smooth visual results while preserving sharp features better than TV models.
Artistic image analysis using graph-based learning approaches.
Carneiro, Gustavo
2013-08-01
We introduce a new methodology for the problem of artistic image analysis, which among other tasks, involves the automatic identification of visual classes present in an art work. In this paper, we advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the novelties of our methodology is the proposed formulation that is a principled way of combining these two similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results compared with the following baseline algorithms: bag of visual words, label propagation, matrix completion, and structural learning. We also show that the proposed approach leads to a more efficient inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary problems), where we show the inference and training running times, and quantitative comparisons with respect to several retrieval and annotation performance measures.
Many-core graph analytics using accelerated sparse linear algebra routines
NASA Astrophysics Data System (ADS)
Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric
2016-05-01
Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
Probabilistic fusion of stereo with color and contrast for bilayer segmentation.
Kolmogorov, Vladimir; Criminisi, Antonio; Blake, Andrew; Cross, Geoffrey; Rother, Carsten
2006-09-01
This paper describes models and algorithms for the real-time segmentation of foreground from background layers in stereo video sequences. Automatic separation of layers from color/contrast or from stereo alone is known to be error-prone. Here, color, contrast, and stereo matching information are fused to infer layers accurately and efficiently. The first algorithm, Layered Dynamic Programming (LDP), solves stereo in an extended six-state space that represents both foreground/background layers and occluded regions. The stereo-match likelihood is then fused with a contrast-sensitive color model that is learned on-the-fly and stereo disparities are obtained by dynamic programming. The second algorithm, Layered Graph Cut (LGC), does not directly solve stereo. Instead, the stereo match likelihood is marginalized over disparities to evaluate foreground and background hypotheses and then fused with a contrast-sensitive color model like the one used in LDP. Segmentation is solved efficiently by ternary graph cut. Both algorithms are evaluated with respect to ground truth data and found to have similar performance, substantially better than either stereo or color/ contrast alone. However, their characteristics with respect to computational efficiency are rather different. The algorithms are demonstrated in the application of background substitution and shown to give good quality composite video output.
Finding minimum-quotient cuts in planar graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, J.K.; Phillips, C.A.
Given a graph G = (V, E) where each vertex v {element_of} V is assigned a weight w(v) and each edge e {element_of} E is assigned a cost c(e), the quotient of a cut partitioning the vertices of V into sets S and {bar S} is c(S, {bar S})/min{l_brace}w(S), w(S){r_brace}, where c(S, {bar S}) is the sum of the costs of the edges crossing the cut and w(S) and w({bar S}) are the sum of the weights of the vertices in S and {bar S}, respectively. The problem of finding a cut whose quotient is minimum for a graph hasmore » in recent years attracted considerable attention, due in large part to the work of Rao and Leighton and Rao. They have shown that an algorithm (exact or approximation) for the minimum-quotient-cut problem can be used to obtain an approximation algorithm for the more famous minimumb-balanced-cut problem, which requires finding a cut (S,{bar S}) minimizing c(S,{bar S}) subject to the constraint bW {le} w(S) {le} (1 {minus} b)W, where W is the total vertex weight and b is some fixed balance in the range 0 < b {le} {1/2}. Unfortunately, the minimum-quotient-cut problem is strongly NP-hard for general graphs, and the best polynomial-time approximation algorithm known for the general problem guarantees only a cut whose quotient is at mostO(lg n) times optimal, where n is the size of the graph. However, for planar graphs, the minimum-quotient-cut problem appears more tractable, as Rao has developed several efficient approximation algorithms for the planar version of the problem capable of finding a cut whose quotient is at most some constant times optimal. In this paper, we improve Rao`s algorithms, both in terms of accuracy and speed. As our first result, we present two pseudopolynomial-time exact algorithms for the planar minimum-quotient-cut problem. As Rao`s most accurate approximation algorithm for the problem -- also a pseudopolynomial-time algorithm -- guarantees only a 1.5-times-optimal cut, our algorithms represent a significant advance.« less
Finding minimum-quotient cuts in planar graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, J.K.; Phillips, C.A.
Given a graph G = (V, E) where each vertex v [element of] V is assigned a weight w(v) and each edge e [element of] E is assigned a cost c(e), the quotient of a cut partitioning the vertices of V into sets S and [bar S] is c(S, [bar S])/min[l brace]w(S), w(S)[r brace], where c(S, [bar S]) is the sum of the costs of the edges crossing the cut and w(S) and w([bar S]) are the sum of the weights of the vertices in S and [bar S], respectively. The problem of finding a cut whose quotient is minimummore » for a graph has in recent years attracted considerable attention, due in large part to the work of Rao and Leighton and Rao. They have shown that an algorithm (exact or approximation) for the minimum-quotient-cut problem can be used to obtain an approximation algorithm for the more famous minimumb-balanced-cut problem, which requires finding a cut (S,[bar S]) minimizing c(S,[bar S]) subject to the constraint bW [le] w(S) [le] (1 [minus] b)W, where W is the total vertex weight and b is some fixed balance in the range 0 < b [le] [1/2]. Unfortunately, the minimum-quotient-cut problem is strongly NP-hard for general graphs, and the best polynomial-time approximation algorithm known for the general problem guarantees only a cut whose quotient is at mostO(lg n) times optimal, where n is the size of the graph. However, for planar graphs, the minimum-quotient-cut problem appears more tractable, as Rao has developed several efficient approximation algorithms for the planar version of the problem capable of finding a cut whose quotient is at most some constant times optimal. In this paper, we improve Rao's algorithms, both in terms of accuracy and speed. As our first result, we present two pseudopolynomial-time exact algorithms for the planar minimum-quotient-cut problem. As Rao's most accurate approximation algorithm for the problem -- also a pseudopolynomial-time algorithm -- guarantees only a 1.5-times-optimal cut, our algorithms represent a significant advance.« less
Efficient path-based computations on pedigree graphs with compact encodings
2012-01-01
A pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. Along with rapidly growing knowledge of genetics and accumulation of genealogy information, pedigree data is becoming increasingly important. In large pedigree graphs, path-based methods for efficiently computing genealogical measurements, such as inbreeding and kinship coefficients of individuals, depend on efficient identification and processing of paths. In this paper, we propose a new compact path encoding scheme on large pedigrees, accompanied by an efficient algorithm for identifying paths. We demonstrate the utilization of our proposed method by applying it to the inbreeding coefficient computation. We present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements. PMID:22536898
Finding minimum spanning trees more efficiently for tile-based phase unwrapping
NASA Astrophysics Data System (ADS)
Sawaf, Firas; Tatam, Ralph P.
2006-06-01
The tile-based phase unwrapping method employs an algorithm for finding the minimum spanning tree (MST) in each tile. We first examine the properties of a tile's representation from a graph theory viewpoint, observing that it is possible to make use of a more efficient class of MST algorithms. We then describe a novel linear time algorithm which reduces the size of the MST problem by half at the least, and solves it completely at best. We also show how this algorithm can be applied to a tile using a sliding window technique. Finally, we show how the reduction algorithm can be combined with any other standard MST algorithm to achieve a more efficient hybrid, using Prim's algorithm for empirical comparison and noting that the reduction algorithm takes only 0.1% of the time taken by the overall hybrid.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆
Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-01-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
Mabu, Shingo; Hirasawa, Kotaro; Hu, Jinglu
2007-01-01
This paper proposes a graph-based evolutionary algorithm called Genetic Network Programming (GNP). Our goal is to develop GNP, which can deal with dynamic environments efficiently and effectively, based on the distinguished expression ability of the graph (network) structure. The characteristics of GNP are as follows. 1) GNP programs are composed of a number of nodes which execute simple judgment/processing, and these nodes are connected by directed links to each other. 2) The graph structure enables GNP to re-use nodes, thus the structure can be very compact. 3) The node transition of GNP is executed according to its node connections without any terminal nodes, thus the past history of the node transition affects the current node to be used and this characteristic works as an implicit memory function. These structural characteristics are useful for dealing with dynamic environments. Furthermore, we propose an extended algorithm, "GNP with Reinforcement Learning (GNPRL)" which combines evolution and reinforcement learning in order to create effective graph structures and obtain better results in dynamic environments. In this paper, we applied GNP to the problem of determining agents' behavior to evaluate its effectiveness. Tileworld was used as the simulation environment. The results show some advantages for GNP over conventional methods.
Applying graph partitioning methods in measurement-based dynamic load balancing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav; Fourestier, Sebastien; Menon, Harshitha
Load imbalance leads to an increasing waste of resources as an application is scaled to more and more processors. Achieving the best parallel efficiency for a program requires optimal load balancing which is a NP-hard problem. However, finding near-optimal solutions to this problem for complex computational science and engineering applications is becoming increasingly important. Charm++, a migratable objects based programming model, provides a measurement-based dynamic load balancing framework. This framework instruments and then migrates over-decomposed objects to balance computational load and communication at runtime. This paper explores the use of graph partitioning algorithms, traditionally used for partitioning physical domains/meshes, formore » measurement-based dynamic load balancing of parallel applications. In particular, we present repartitioning methods developed in a graph partitioning toolbox called SCOTCH that consider the previous mapping to minimize migration costs. We also discuss a new imbalance reduction algorithm for graphs with irregular load distributions. We compare several load balancing algorithms using microbenchmarks on Intrepid and Ranger and evaluate the effect of communication, number of cores and number of objects on the benefit achieved from load balancing. New algorithms developed in SCOTCH lead to better performance compared to the METIS partitioners for several cases, both in terms of the application execution time and fewer number of objects migrated.« less
Enumerating Substituted Benzene Isomers of Tree-Like Chemical Graphs.
Li, Jinghui; Nagamochi, Hiroshi; Akutsu, Tatsuya
2018-01-01
Enumeration of chemical structures is useful for drug design, which is one of the main targets of computational biology and bioinformatics. A chemical graph with no other cycles than benzene rings is called tree-like, and becomes a tree possibly with multiple edges if we contract each benzene ring into a single virtual atom of valence 6. All tree-like chemical graphs with a given tree representation are called the substituted benzene isomers of . When we replace each virtual atom in with a benzene ring to obtain a substituted benzene isomer, distinct isomers of are caused by the difference in arrangements of atom groups around a benzene ring. In this paper, we propose an efficient algorithm that enumerates all substituted benzene isomers of a given tree representation . Our algorithm first counts the number of all the isomers of the tree representation by a dynamic programming method. To enumerate all the isomers, for each , our algorithm then generates the th isomer by backtracking the counting phase of the dynamic programming. We also implemented our algorithm for computational experiments.
A technology mapping based on graph of excitations and outputs for finite state machines
NASA Astrophysics Data System (ADS)
Kania, Dariusz; Kulisz, Józef
2017-11-01
A new, efficient technology mapping method of FSMs, dedicated for PAL-based PLDs is proposed. The essence of the method consists in searching for the minimal set of PAL-based logic blocks that cover a set of multiple-output implicants describing the transition and output functions of an FSM. The method is based on a new concept of graph: the Graph of Excitations and Outputs. The proposed algorithm was tested using the FSM benchmarks. The obtained results were compared with the classical technology mapping of FSM.
High performance genetic algorithm for VLSI circuit partitioning
NASA Astrophysics Data System (ADS)
Dinu, Simona
2016-12-01
Partitioning is one of the biggest challenges in computer-aided design for VLSI circuits (very large-scale integrated circuits). This work address the min-cut balanced circuit partitioning problem- dividing the graph that models the circuit into almost equal sized k sub-graphs while minimizing the number of edges cut i.e. minimizing the number of edges connecting the sub-graphs. The problem may be formulated as a combinatorial optimization problem. Experimental studies in the literature have shown the problem to be NP-hard and thus it is important to design an efficient heuristic algorithm to solve it. The approach proposed in this study is a parallel implementation of a genetic algorithm, namely an island model. The information exchange between the evolving subpopulations is modeled using a fuzzy controller, which determines an optimal balance between exploration and exploitation of the solution space. The results of simulations show that the proposed algorithm outperforms the standard sequential genetic algorithm both in terms of solution quality and convergence speed. As a direction for future study, this research can be further extended to incorporate local search operators which should include problem-specific knowledge. In addition, the adaptive configuration of mutation and crossover rates is another guidance for future research.
Analysis of Community Detection Algorithms for Large Scale Cyber Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mane, Prachita; Shanbhag, Sunanda; Kamath, Tanmayee
The aim of this project is to use existing community detection algorithms on an IP network dataset to create supernodes within the network. This study compares the performance of different algorithms on the network in terms of running time. The paper begins with an introduction to the concept of clustering and community detection followed by the research question that the team aimed to address. Further the paper describes the graph metrics that were considered in order to shortlist algorithms followed by a brief explanation of each algorithm with respect to the graph metric on which it is based. The nextmore » section in the paper describes the methodology used by the team in order to run the algorithms and determine which algorithm is most efficient with respect to running time. Finally, the last section of the paper includes the results obtained by the team and a conclusion based on those results as well as future work.« less
Principal curve detection in complicated graph images
NASA Astrophysics Data System (ADS)
Liu, Yuncai; Huang, Thomas S.
2001-09-01
Finding principal curves in an image is an important low level processing in computer vision and pattern recognition. Principal curves are those curves in an image that represent boundaries or contours of objects of interest. In general, a principal curve should be smooth with certain length constraint and allow either smooth or sharp turning. In this paper, we present a method that can efficiently detect principal curves in complicated map images. For a given feature image, obtained from edge detection of an intensity image or thinning operation of a pictorial map image, the feature image is first converted to a graph representation. In graph image domain, the operation of principal curve detection is performed to identify useful image features. The shortest path and directional deviation schemes are used in our algorithm os principal verve detection, which is proven to be very efficient working with real graph images.
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
Transforming graph states using single-qubit operations.
Dahlberg, Axel; Wehner, Stephanie
2018-07-13
Stabilizer states form an important class of states in quantum information, and are of central importance in quantum error correction. Here, we provide an algorithm for deciding whether one stabilizer (target) state can be obtained from another stabilizer (source) state by single-qubit Clifford operations (LC), single-qubit Pauli measurements (LPM) and classical communication (CC) between sites holding the individual qubits. What is more, we provide a recipe to obtain the sequence of LC+LPM+CC operations which prepare the desired target state from the source state, and show how these operations can be applied in parallel to reach the target state in constant time. Our algorithm has applications in quantum networks, quantum computing, and can also serve as a design tool-for example, to find transformations between quantum error correcting codes. We provide a software implementation of our algorithm that makes this tool easier to apply. A key insight leading to our algorithm is to show that the problem is equivalent to one in graph theory, which is to decide whether some graph G ' is a vertex-minor of another graph G The vertex-minor problem is, in general, [Formula: see text]-Complete, but can be solved efficiently on graphs which are not too complex. A measure of the complexity of a graph is the rank-width which equals the Schmidt-rank width of a subclass of stabilizer states called graph states, and thus intuitively is a measure of entanglement. Here, we show that the vertex-minor problem can be solved in time O (| G | 3 ), where | G | is the size of the graph G , whenever the rank-width of G and the size of G ' are bounded. Our algorithm is based on techniques by Courcelle for solving fixed parameter tractable problems, where here the relevant fixed parameter is the rank width. The second half of this paper serves as an accessible but far from exhausting introduction to these concepts, that could be useful for many other problems in quantum information.This article is part of a discussion meeting issue 'Foundations of quantum mechanics and their impact on contemporary society'. © 2018 The Author(s).
DOE Office of Scientific and Technical Information (OSTI.GOV)
2010-09-30
The Umbra gbs (Graph-Based Search) library provides implementations of graph-based search/planning algorithms that can be applied to legacy graph data structures. Unlike some other graph algorithm libraries, this one does not require your graph class to inherit from a specific base class. Implementations of Dijkstra's Algorithm and A-Star search are included and can be used with graphs that are lazily-constructed.
Optimizing spread dynamics on graphs by message passing
NASA Astrophysics Data System (ADS)
Altarelli, F.; Braunstein, A.; Dall'Asta, L.; Zecchina, R.
2013-09-01
Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the past decades, much effort has been devoted to understanding the typical behavior of the cascades arising from initial conditions extracted at random from some given ensemble. However, the problem of optimizing the trajectory of the system, i.e. of identifying appropriate initial conditions to maximize (or minimize) the final number of active nodes, is still considered to be practically intractable, with the only exception being models that satisfy a sort of diminishing returns property called submodularity. Submodular models can be approximately solved by means of greedy strategies, but by definition they lack cooperative characteristics which are fundamental in many real systems. Here we introduce an efficient algorithm based on statistical physics for the optimization of trajectories in cascade processes on graphs. We show that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Analytic and algorithmic results on random graphs are complemented by the solution of the spread maximization problem on a real-world network (the Epinions consumer reviews network).
Predicting and Detecting Emerging Cyberattack Patterns Using StreamWorks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Choudhury, Sutanay; Feo, John T.
2014-06-30
The number and sophistication of cyberattacks on industries and governments have dramatically grown in recent years. To counter this movement, new advanced tools and techniques are needed to detect cyberattacks in their early stages such that defensive actions may be taken to avert or mitigate potential damage. From a cybersecurity analysis perspective, detecting cyberattacks may be cast as a problem of identifying patterns in computer network traffic. Logically and intuitively, these patterns may take on the form of a directed graph that conveys how an attack or intrusion propagates through the computers of a network. Such cyberattack graphs could providemore » cybersecurity analysts with powerful conceptual representations that are natural to express and analyze. We have been researching and developing graph-centric approaches and algorithms for dynamic cyberattack detection. The advanced dynamic graph algorithms we are developing will be packaged into a streaming network analysis framework known as StreamWorks. With StreamWorks, a scientist or analyst may detect and identify precursor events and patterns as they emerge in complex networks. This analysis framework is intended to be used in a dynamic environment where network data is streamed in and is appended to a large-scale dynamic graph. Specific graphical query patterns are decomposed and collected into a graph query library. The individual decomposed subpatterns in the library are continuously and efficiently matched against the dynamic graph as it evolves to identify and detect early, partial subgraph patterns. The scalable emerging subgraph pattern algorithms will match on both structural and semantic network properties.« less
The Edge-Disjoint Path Problem on Random Graphs by Message-Passing.
Altarelli, Fabrizio; Braunstein, Alfredo; Dall'Asta, Luca; De Bacco, Caterina; Franz, Silvio
2015-01-01
We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length.
The Edge-Disjoint Path Problem on Random Graphs by Message-Passing
2015-01-01
We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length. PMID:26710102
Efficient quantum walk on a quantum processor
Qiang, Xiaogang; Loke, Thomas; Montanaro, Ashley; Aungskunsiri, Kanin; Zhou, Xiaoqi; O'Brien, Jeremy L.; Wang, Jingbo B.; Matthews, Jonathan C. F.
2016-01-01
The random walk formalism is used across a wide range of applications, from modelling share prices to predicting population genetics. Likewise, quantum walks have shown much potential as a framework for developing new quantum algorithms. Here we present explicit efficient quantum circuits for implementing continuous-time quantum walks on the circulant class of graphs. These circuits allow us to sample from the output probability distributions of quantum walks on circulant graphs efficiently. We also show that solving the same sampling problem for arbitrary circulant quantum circuits is intractable for a classical computer, assuming conjectures from computational complexity theory. This is a new link between continuous-time quantum walks and computational complexity theory and it indicates a family of tasks that could ultimately demonstrate quantum supremacy over classical computers. As a proof of principle, we experimentally implement the proposed quantum circuit on an example circulant graph using a two-qubit photonics quantum processor. PMID:27146471
Algorithm of composing the schedule of construction and installation works
NASA Astrophysics Data System (ADS)
Nehaj, Rustam; Molotkov, Georgij; Rudchenko, Ivan; Grinev, Anatolij; Sekisov, Aleksandr
2017-10-01
An algorithm for scheduling works is developed, in which the priority of the work corresponds to the total weight of the subordinate works, the vertices of the graph, and it is proved that for graphs of the tree type the algorithm is optimal. An algorithm is synthesized to reduce the search for solutions when drawing up schedules of construction and installation works, allocating a subset with the optimal solution of the problem of the minimum power, which is determined by the structure of its initial data and numerical values. An algorithm for scheduling construction and installation work is developed, taking into account the schedule for the movement of brigades, which is characterized by the possibility to efficiently calculate the values of minimizing the time of work performance by the parameters of organizational and technological reliability through the use of the branch and boundary method. The program of the computational algorithm was compiled in the MatLAB-2008 program. For the initial data of the matrix, random numbers were taken, uniformly distributed in the range from 1 to 100. It takes 0.5; 2.5; 7.5; 27 minutes to solve the problem. Thus, the proposed method for estimating the lower boundary of the solution is sufficiently accurate and allows efficient solution of the minimax task of scheduling construction and installation works.
Optimal Fungal Space Searching Algorithms.
Asenova, Elitsa; Lin, Hsin-Yu; Fu, Eileen; Nicolau, Dan V; Nicolau, Dan V
2016-10-01
Previous experiments have shown that fungi use an efficient natural algorithm for searching the space available for their growth in micro-confined networks, e.g., mazes. This natural "master" algorithm, which comprises two "slave" sub-algorithms, i.e., collision-induced branching and directional memory, has been shown to be more efficient than alternatives, with one, or the other, or both sub-algorithms turned off. In contrast, the present contribution compares the performance of the fungal natural algorithm against several standard artificial homologues. It was found that the space-searching fungal algorithm consistently outperforms uninformed algorithms, such as Depth-First-Search (DFS). Furthermore, while the natural algorithm is inferior to informed ones, such as A*, this under-performance does not importantly increase with the increase of the size of the maze. These findings suggest that a systematic effort of harvesting the natural space searching algorithms used by microorganisms is warranted and possibly overdue. These natural algorithms, if efficient, can be reverse-engineered for graph and tree search strategies.
Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications.
Karyotis, Vasileios; Tsitseklis, Konstantinos; Sotiropoulos, Konstantinos; Papavassiliou, Symeon
2018-04-15
In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan-Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing.
Big Data Clustering via Community Detection and Hyperbolic Network Embedding in IoT Applications
Sotiropoulos, Konstantinos
2018-01-01
In this paper, we present a novel data clustering framework for big sensory data produced by IoT applications. Based on a network representation of the relations among multi-dimensional data, data clustering is mapped to node clustering over the produced data graphs. To address the potential very large scale of such datasets/graphs that test the limits of state-of-the-art approaches, we map the problem of data clustering to a community detection one over the corresponding data graphs. Specifically, we propose a novel computational approach for enhancing the traditional Girvan–Newman (GN) community detection algorithm via hyperbolic network embedding. The data dependency graph is embedded in the hyperbolic space via Rigel embedding, allowing more efficient computation of edge-betweenness centrality needed in the GN algorithm. This allows for more efficient clustering of the nodes of the data graph in terms of modularity, without sacrificing considerable accuracy. In order to study the operation of our approach with respect to enhancing GN community detection, we employ various representative types of artificial complex networks, such as scale-free, small-world and random geometric topologies, and frequently-employed benchmark datasets for demonstrating its efficacy in terms of data clustering via community detection. Furthermore, we provide a proof-of-concept evaluation by applying the proposed framework over multi-dimensional datasets obtained from an operational smart-city/building IoT infrastructure provided by the Federated Interoperable Semantic IoT/cloud Testbeds and Applications (FIESTA-IoT) testbed federation. It is shown that the proposed framework can be indeed used for community detection/data clustering and exploited in various other IoT applications, such as performing more energy-efficient smart-city/building sensing. PMID:29662043
NASA Astrophysics Data System (ADS)
Roy, Priyanka; Gholami, Peyman; Kuppuswamy Parthasarathy, Mohana; Zelek, John; Lakshminarayanan, Vasudevan
2018-02-01
Segmentation of spectral-domain Optical Coherence Tomography (SD-OCT) images facilitates visualization and quantification of sub-retinal layers for diagnosis of retinal pathologies. However, manual segmentation is subjective, expertise dependent, and time-consuming, which limits applicability of SD-OCT. Efforts are therefore being made to implement active-contours, artificial intelligence, and graph-search to automatically segment retinal layers with accuracy comparable to that of manual segmentation, to ease clinical decision-making. Although, low optical contrast, heavy speckle noise, and pathologies pose challenges to automated segmentation. Graph-based image segmentation approach stands out from the rest because of its ability to minimize the cost function while maximising the flow. This study has developed and implemented a shortest-path based graph-search algorithm for automated intraretinal layer segmentation of SD-OCT images. The algorithm estimates the minimal-weight path between two graph-nodes based on their gradients. Boundary position indices (BPI) are computed from the transition between pixel intensities. The mean difference between BPIs of two consecutive layers quantify individual layer thicknesses, which shows statistically insignificant differences when compared to a previous study [for overall retina: p = 0.17, for individual layers: p > 0.05 (except one layer: p = 0.04)]. These results substantiate the accurate delineation of seven intraretinal boundaries in SD-OCT images by this algorithm, with a mean computation time of 0.93 seconds (64-bit Windows10, core i5, 8GB RAM). Besides being self-reliant for denoising, the algorithm is further computationally optimized to restrict segmentation within the user defined region-of-interest. The efficiency and reliability of this algorithm, even in noisy image conditions, makes it clinically applicable.
PuReD-MCL: a graph-based PubMed document clustering methodology.
Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A
2008-09-01
Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/
A learning approach to the bandwidth multicolouring problem
NASA Astrophysics Data System (ADS)
Akbari Torkestani, Javad
2016-05-01
In this article, a generalisation of the vertex colouring problem known as bandwidth multicolouring problem (BMCP), in which a set of colours is assigned to each vertex such that the difference between the colours, assigned to each vertex and its neighbours, is by no means less than a predefined threshold, is considered. It is shown that the proposed method can be applied to solve the bandwidth colouring problem (BCP) as well. BMCP is known to be NP-hard in graph theory, and so a large number of approximation solutions, as well as exact algorithms, have been proposed to solve it. In this article, two learning automata-based approximation algorithms are proposed for estimating a near-optimal solution to the BMCP. We show, for the first proposed algorithm, that by choosing a proper learning rate, the algorithm finds the optimal solution with a probability close enough to unity. Moreover, we compute the worst-case time complexity of the first algorithm for finding a 1/(1-ɛ) optimal solution to the given problem. The main advantage of this method is that a trade-off between the running time of algorithm and the colour set size (colouring optimality) can be made, by a proper choice of the learning rate also. Finally, it is shown that the running time of the proposed algorithm is independent of the graph size, and so it is a scalable algorithm for large graphs. The second proposed algorithm is compared with some well-known colouring algorithms and the results show the efficiency of the proposed algorithm in terms of the colour set size and running time of algorithm.
A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.
Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas
2011-03-15
Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.
Anisotropic Laplace-Beltrami Eigenmaps: Bridging Reeb Graphs and Skeletons*
Shi, Yonggang; Lai, Rongjie; Krishna, Sheila; Sicotte, Nancy; Dinov, Ivo; Toga, Arthur W.
2010-01-01
In this paper we propose a novel approach of computing skeletons of robust topology for simply connected surfaces with boundary by constructing Reeb graphs from the eigenfunctions of an anisotropic Laplace-Beltrami operator. Our work brings together the idea of Reeb graphs and skeletons by incorporating a flux-based weight function into the Laplace-Beltrami operator. Based on the intrinsic geometry of the surface, the resulting Reeb graph is pose independent and captures the global profile of surface geometry. Our algorithm is very efficient and it only takes several seconds to compute on neuroanatomical structures such as the cingulate gyrus and corpus callosum. In our experiments, we show that the Reeb graphs serve well as an approximate skeleton with consistent topology while following the main body of conventional skeletons quite accurately. PMID:21339850
Post-processing interstitialcy diffusion from molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Bhardwaj, U.; Bukkuru, S.; Warrier, M.
2016-01-01
An algorithm to rigorously trace the interstitialcy diffusion trajectory in crystals is developed. The algorithm incorporates unsupervised learning and graph optimization which obviate the need to input extra domain specific information depending on crystal or temperature of the simulation. The algorithm is implemented in a flexible framework as a post-processor to molecular dynamics (MD) simulations. We describe in detail the reduction of interstitialcy diffusion into known computational problems of unsupervised clustering and graph optimization. We also discuss the steps, computational efficiency and key components of the algorithm. Using the algorithm, thermal interstitialcy diffusion from low to near-melting point temperatures is studied. We encapsulate the algorithms in a modular framework with functionality to calculate diffusion coefficients, migration energies and other trajectory properties. The study validates the algorithm by establishing the conformity of output parameters with experimental values and provides detailed insights for the interstitialcy diffusion mechanism. The algorithm along with the help of supporting visualizations and analysis gives convincing details and a new approach to quantifying diffusion jumps, jump-lengths, time between jumps and to identify interstitials from lattice atoms.
Post-processing interstitialcy diffusion from molecular dynamics simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhardwaj, U., E-mail: haptork@gmail.com; Bukkuru, S.; Warrier, M.
2016-01-15
An algorithm to rigorously trace the interstitialcy diffusion trajectory in crystals is developed. The algorithm incorporates unsupervised learning and graph optimization which obviate the need to input extra domain specific information depending on crystal or temperature of the simulation. The algorithm is implemented in a flexible framework as a post-processor to molecular dynamics (MD) simulations. We describe in detail the reduction of interstitialcy diffusion into known computational problems of unsupervised clustering and graph optimization. We also discuss the steps, computational efficiency and key components of the algorithm. Using the algorithm, thermal interstitialcy diffusion from low to near-melting point temperatures ismore » studied. We encapsulate the algorithms in a modular framework with functionality to calculate diffusion coefficients, migration energies and other trajectory properties. The study validates the algorithm by establishing the conformity of output parameters with experimental values and provides detailed insights for the interstitialcy diffusion mechanism. The algorithm along with the help of supporting visualizations and analysis gives convincing details and a new approach to quantifying diffusion jumps, jump-lengths, time between jumps and to identify interstitials from lattice atoms. -- Graphical abstract:.« less
Dexter, Alex; Race, Alan M; Steven, Rory T; Barnes, Jennifer R; Hulme, Heather; Goodwin, Richard J A; Styles, Iain B; Bunch, Josephine
2017-11-07
Clustering is widely used in MSI to segment anatomical features and differentiate tissue types, but existing approaches are both CPU and memory-intensive, limiting their application to small, single data sets. We propose a new approach that uses a graph-based algorithm with a two-phase sampling method that overcomes this limitation. We demonstrate the algorithm on a range of sample types and show that it can segment anatomical features that are not identified using commonly employed algorithms in MSI, and we validate our results on synthetic MSI data. We show that the algorithm is robust to fluctuations in data quality by successfully clustering data with a designed-in variance using data acquired with varying laser fluence. Finally, we show that this method is capable of generating accurate segmentations of large MSI data sets acquired on the newest generation of MSI instruments and evaluate these results by comparison with histopathology.
Graph-based normalization and whitening for non-linear data analysis.
Aaron, Catherine
2006-01-01
In this paper we construct a graph-based normalization algorithm for non-linear data analysis. The principle of this algorithm is to get a spherical average neighborhood with unit radius. First we present a class of global dispersion measures used for "global normalization"; we then adapt these measures using a weighted graph to build a local normalization called "graph-based" normalization. Then we give details of the graph-based normalization algorithm and illustrate some results. In the second part we present a graph-based whitening algorithm built by analogy between the "global" and the "local" problem.
LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.
El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher
2016-11-01
The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Scalable Faceted Ranking in Tagging Systems
NASA Astrophysics Data System (ADS)
Orlicki, José I.; Alvarez-Hamelin, J. Ignacio; Fierens, Pablo I.
Nowadays, web collaborative tagging systems which allow users to upload, comment on and recommend contents, are growing. Such systems can be represented as graphs where nodes correspond to users and tagged-links to recommendations. In this paper we analyze the problem of computing a ranking of users with respect to a facet described as a set of tags. A straightforward solution is to compute a PageRank-like algorithm on a facet-related graph, but it is not feasible for online computation. We propose an alternative: (i) a ranking for each tag is computed offline on the basis of tag-related subgraphs; (ii) a faceted order is generated online by merging rankings corresponding to all the tags in the facet. Based on the graph analysis of YouTube and Flickr, we show that step (i) is scalable. We also present efficient algorithms for step (ii), which are evaluated by comparing their results with two gold standards.
Graph Design via Convex Optimization: Online and Distributed Perspectives
NASA Astrophysics Data System (ADS)
Meng, De
Network and graph have long been natural abstraction of relations in a variety of applications, e.g. transportation, power system, social network, communication, electrical circuit, etc. As a large number of computation and optimization problems are naturally defined on graphs, graph structures not only enable important properties of these problems, but also leads to highly efficient distributed and online algorithms. For example, graph separability enables the parallelism for computation and operation as well as limits the size of local problems. More interestingly, graphs can be defined and constructed in order to take best advantage of those problem properties. This dissertation focuses on graph structure and design in newly proposed optimization problems, which establish a bridge between graph properties and optimization problem properties. We first study a new optimization problem called Geodesic Distance Maximization Problem (GDMP). Given a graph with fixed edge weights, finding the shortest path, also known as the geodesic, between two nodes is a well-studied network flow problem. We introduce the Geodesic Distance Maximization Problem (GDMP): the problem of finding the edge weights that maximize the length of the geodesic subject to convex constraints on the weights. We show that GDMP is a convex optimization problem for a wide class of flow costs, and provide a physical interpretation using the dual. We present applications of the GDMP in various fields, including optical lens design, network interdiction, and resource allocation in the control of forest fires. We develop an Alternating Direction Method of Multipliers (ADMM) by exploiting specific problem structures to solve large-scale GDMP, and demonstrate its effectiveness in numerical examples. We then turn our attention to distributed optimization on graph with only local communication. Distributed optimization arises in a variety of applications, e.g. distributed tracking and localization, estimation problems in sensor networks, multi-agent coordination. Distributed optimization aims to optimize a global objective function formed by summation of coupled local functions over a graph via only local communication and computation. We developed a weighted proximal ADMM for distributed optimization using graph structure. This fully distributed, single-loop algorithm allows simultaneous updates and can be viewed as a generalization of existing algorithms. More importantly, we achieve faster convergence by jointly designing graph weights and algorithm parameters. Finally, we propose a new problem on networks called Online Network Formation Problem: starting with a base graph and a set of candidate edges, at each round of the game, player one first chooses a candidate edge and reveals it to player two, then player two decides whether to accept it; player two can only accept limited number of edges and make online decisions with the goal to achieve the best properties of the synthesized network. The network properties considered include the number of spanning trees, algebraic connectivity and total effective resistance. These network formation games arise in a variety of cooperative multiagent systems. We propose a primal-dual algorithm framework for the general online network formation game, and analyze the algorithm performance by the competitive ratio and regret.
Random Walk Graph Laplacian-Based Smoothness Prior for Soft Decoding of JPEG Images.
Liu, Xianming; Cheung, Gene; Wu, Xiaolin; Zhao, Debin
2017-02-01
Given the prevalence of joint photographic experts group (JPEG) compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed discrete cosine transform (DCT) coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors-Laplacian prior for DCT coefficients, sparsity prior, and graph-signal smoothness prior for image patches-to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the overcomplete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared with the previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms the state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.
NASA Astrophysics Data System (ADS)
Dangi, Shusil; Linte, Cristian A.
2017-03-01
Segmentation of right ventricle from cardiac MRI images can be used to build pre-operative anatomical heart models to precisely identify regions of interest during minimally invasive therapy. Furthermore, many functional parameters of right heart such as right ventricular volume, ejection fraction, myocardial mass and thickness can also be assessed from the segmented images. To obtain an accurate and computationally efficient segmentation of right ventricle from cardiac cine MRI, we propose a segmentation algorithm formulated as an energy minimization problem in a graph. Shape prior obtained by propagating label from an average atlas using affine registration is incorporated into the graph framework to overcome problems in ill-defined image regions. The optimal segmentation corresponding to the labeling with minimum energy configuration of the graph is obtained via graph-cuts and is iteratively refined to produce the final right ventricle blood pool segmentation. We quantitatively compare the segmentation results obtained from our algorithm to the provided gold-standard expert manual segmentation for 16 cine-MRI datasets available through the MICCAI 2012 Cardiac MR Right Ventricle Segmentation Challenge according to several similarity metrics, including Dice coefficient, Jaccard coefficient, Hausdorff distance, and Mean absolute distance error.
Power and Efficiency Optimized in Traveling-Wave Tubes Over a Broad Frequency Bandwidth
NASA Technical Reports Server (NTRS)
Wilson, Jeffrey D.
2001-01-01
A traveling-wave tube (TWT) is an electron beam device that is used to amplify electromagnetic communication waves at radio and microwave frequencies. TWT's are critical components in deep space probes, communication satellites, and high-power radar systems. Power conversion efficiency is of paramount importance for TWT's employed in deep space probes and communication satellites. A previous effort was very successful in increasing efficiency and power at a single frequency (ref. 1). Such an algorithm is sufficient for narrow bandwidth designs, but for optimal designs in applications that require high radiofrequency power over a wide bandwidth, such as high-density communications or high-resolution radar, the variation of the circuit response with respect to frequency must be considered. This work at the NASA Glenn Research Center is the first to develop techniques for optimizing TWT efficiency and output power over a broad frequency bandwidth (ref. 2). The techniques are based on simulated annealing, which has the advantage over conventional optimization techniques in that it enables the best possible solution to be obtained (ref. 3). Two new broadband simulated annealing algorithms were developed that optimize (1) minimum saturated power efficiency over a frequency bandwidth and (2) simultaneous bandwidth and minimum power efficiency over the frequency band with constant input power. The algorithms were incorporated into the NASA coupled-cavity TWT computer model (ref. 4) and used to design optimal phase velocity tapers using the 59- to 64-GHz Hughes 961HA coupled-cavity TWT as a baseline model. In comparison to the baseline design, the computational results of the first broad-band design algorithm show an improvement of 73.9 percent in minimum saturated efficiency (see the top graph). The second broadband design algorithm (see the bottom graph) improves minimum radiofrequency efficiency with constant input power drive by a factor of 2.7 at the high band edge (64 GHz) and increases simultaneous bandwidth by 500 MHz.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Yubin; Shankar, Mallikarjun; Park, Byung H.
Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcaremore » RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed 3NF Equivalent Graph (3EG) transformation.We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.« less
Graphing trillions of triangles.
Burkhardt, Paul
2017-07-01
The increasing size of Big Data is often heralded but how data are transformed and represented is also profoundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has been placed on the scale of the input graph but the product of a graph algorithm can be many times larger than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scalable graph exploration for Big Graphs requires new approaches to algorithms, architectures, and visual analytics. A brief tutorial is given to aid the argument for thoughtful representation of data in the context of graph analysis. Then a new algebraic method to reduce the arithmetic operations in counting and listing triangles in graphs is introduced. Additionally, a scalable triangle listing algorithm in the MapReduce model will be presented followed by a description of the experiments with that algorithm that led to the current largest and fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph exploration technologies is proposed.
An efficient algorithm for planar drawing of RNA structures with pseudoknots of any type.
Byun, Yanga; Han, Kyungsook
2016-06-01
An RNA pseudoknot is a tertiary structural element in which bases of a loop pair with complementary bases are outside the loop. A drawing of RNA secondary structures is a tree, but a drawing of RNA pseudoknots is a graph that has an inner cycle within a pseudoknot and possibly outer cycles formed between the pseudoknot and other structural elements. Visualizing a large-scale RNA structure with pseudoknots as a planar drawing is challenging because a planar drawing of an RNA structure requires both pseudoknots and an entire structure enclosing the pseudoknots to be embedded into a plane without overlapping or crossing. This paper presents an efficient heuristic algorithm for visualizing a pseudoknotted RNA structure as a planar drawing. The algorithm consists of several parts for finding crossing stems and page mapping the stems, for the layout of stem-loops and pseudoknots, and for overlap detection between structural elements and resolving it. Unlike previous algorithms, our algorithm generates a planar drawing for a large RNA structure with pseudoknots of any type and provides a bracket view of the structure. It generates a compact and aesthetic structure graph for a large pseudoknotted RNA structure in O([Formula: see text]) time, where n is the number of stems of the RNA structure.
The Container Problem in Bubble-Sort Graphs
NASA Astrophysics Data System (ADS)
Suzuki, Yasuto; Kaneko, Keiichi
Bubble-sort graphs are variants of Cayley graphs. A bubble-sort graph is suitable as a topology for massively parallel systems because of its simple and regular structure. Therefore, in this study, we focus on n-bubble-sort graphs and propose an algorithm to obtain n-1 disjoint paths between two arbitrary nodes in time bounded by a polynomial in n, the degree of the graph plus one. We estimate the time complexity of the algorithm and the sum of the path lengths after proving the correctness of the algorithm. In addition, we report the results of computer experiments evaluating the average performance of the algorithm.
Graphs, matrices, and the GraphBLAS: Seven good reasons
Kepner, Jeremy; Bader, David; Buluç, Aydın; ...
2015-01-01
The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data complexity, (3) security, (4) mathematical complexity, (5) theoretical analysis, (6) serial performance, and (7) parallel performance. Implementing graph algorithms using matrix-based approaches provides a number of promising solutions to these challenges. The GraphBLAS standard (istcbigdata.org/GraphBlas) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. The GraphBLAS mathematically defines a core set of matrix-based graph operations that can be used to implementmore » a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the GraphBLAS and describes how the GraphBLAS can be used to address many of the challenges associated with analysis of graphs.« less
Scalable Static and Dynamic Community Detection Using Grappolo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman
Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less
An Improved Multi-Sensor Fusion Navigation Algorithm Based on the Factor Graph
Zeng, Qinghua; Chen, Weina; Liu, Jianye; Wang, Huizhe
2017-01-01
An integrated navigation system coupled with additional sensors can be used in the Micro Unmanned Aerial Vehicle (MUAV) applications because the multi-sensor information is redundant and complementary, which can markedly improve the system accuracy. How to deal with the information gathered from different sensors efficiently is an important problem. The fact that different sensors provide measurements asynchronously may complicate the processing of these measurements. In addition, the output signals of some sensors appear to have a non-linear character. In order to incorporate these measurements and calculate a navigation solution in real time, the multi-sensor fusion algorithm based on factor graph is proposed. The global optimum solution is factorized according to the chain structure of the factor graph, which allows for a more general form of the conditional probability density. It can convert the fusion matter into connecting factors defined by these measurements to the graph without considering the relationship between the sensor update frequency and the fusion period. An experimental MUAV system has been built and some experiments have been performed to prove the effectiveness of the proposed method. PMID:28335570
An Improved Multi-Sensor Fusion Navigation Algorithm Based on the Factor Graph.
Zeng, Qinghua; Chen, Weina; Liu, Jianye; Wang, Huizhe
2017-03-21
An integrated navigation system coupled with additional sensors can be used in the Micro Unmanned Aerial Vehicle (MUAV) applications because the multi-sensor information is redundant and complementary, which can markedly improve the system accuracy. How to deal with the information gathered from different sensors efficiently is an important problem. The fact that different sensors provide measurements asynchronously may complicate the processing of these measurements. In addition, the output signals of some sensors appear to have a non-linear character. In order to incorporate these measurements and calculate a navigation solution in real time, the multi-sensor fusion algorithm based on factor graph is proposed. The global optimum solution is factorized according to the chain structure of the factor graph, which allows for a more general form of the conditional probability density. It can convert the fusion matter into connecting factors defined by these measurements to the graph without considering the relationship between the sensor update frequency and the fusion period. An experimental MUAV system has been built and some experiments have been performed to prove the effectiveness of the proposed method.
A fast algorithm for vertex-frequency representations of signals on graphs
Jestrović, Iva; Coyle, James L.; Sejdić, Ervin
2016-01-01
The windowed Fourier transform (short time Fourier transform) and the S-transform are widely used signal processing tools for extracting frequency information from non-stationary signals. Previously, the windowed Fourier transform had been adopted for signals on graphs and has been shown to be very useful for extracting vertex-frequency information from graphs. However, high computational complexity makes these algorithms impractical. We sought to develop a fast windowed graph Fourier transform and a fast graph S-transform requiring significantly shorter computation time. The proposed schemes have been tested with synthetic test graph signals and real graph signals derived from electroencephalography recordings made during swallowing. The results showed that the proposed schemes provide significantly lower computation time in comparison with the standard windowed graph Fourier transform and the fast graph S-transform. Also, the results showed that noise has no effect on the results of the algorithm for the fast windowed graph Fourier transform or on the graph S-transform. Finally, we showed that graphs can be reconstructed from the vertex-frequency representations obtained with the proposed algorithms. PMID:28479645
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sczyrba, Alex; Pratap, Abhishek; Canon, Shane
2011-03-22
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
Kaindl, H; Kainz, G; Radda, K
2001-01-01
Most of the work on search in artificial intelligence (AI) deals with one search direction only-mostly forward search-although it is known that a structural asymmetry of the search graph causes differences in the efficiency of searching in the forward or the backward direction, respectively. In the case of symmetrical graph structure, however, current theory would not predict such differences in efficiency. In several classes of job sequencing problems, we observed a phenomenon of asymmetry in search that relates to the distribution of the are costs in the search graph. This phenomenon can be utilized for improving the search efficiency by a new algorithm that automatically selects the search direction. We demonstrate fur a class of job sequencing problems that, through the utilization of this phenomenon, much more difficult problems can be solved-according to our best knowledge-than by the best published approach, and on the same problems, the running time is much reduced. As a consequence, we propose to check given problems for asymmetrical distribution of are costs that may cause asymmetry in search.
Survey of Approaches to Generate Realistic Synthetic Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lim, Seung-Hwan; Lee, Sangkeun; Powers, Sarah S
A graph is a flexible data structure that can represent relationships between entities. As with other data analysis tasks, the use of realistic graphs is critical to obtaining valid research results. Unfortunately, using the actual ("real-world") graphs for research and new algorithm development is difficult due to the presence of sensitive information in the data or due to the scale of data. This results in practitioners developing algorithms and systems that employ synthetic graphs instead of real-world graphs. Generating realistic synthetic graphs that provide reliable statistical confidence to algorithmic analysis and system evaluation involves addressing technical hurdles in a broadmore » set of areas. This report surveys the state of the art in approaches to generate realistic graphs that are derived from fitted graph models on real-world graphs.« less
Weights and topology: a study of the effects of graph construction on 3D image segmentation.
Grady, Leo; Jolly, Marie-Pierre
2008-01-01
Graph-based algorithms have become increasingly popular for medical image segmentation. The fundamental process for each of these algorithms is to use the image content to generate a set of weights for the graph and then set conditions for an optimal partition of the graph with respect to these weights. To date, the heuristics used for generating the weighted graphs from image intensities have largely been ignored, while the primary focus of attention has been on the details of providing the partitioning conditions. In this paper we empirically study the effects of graph connectivity and weighting function on the quality of the segmentation results. To control for algorithm-specific effects, we employ both the Graph Cuts and Random Walker algorithms in our experiments.
JavaGenes: Evolving Graphs with Crossover
NASA Technical Reports Server (NTRS)
Globus, Al; Atsatt, Sean; Lawton, John; Wipke, Todd
2000-01-01
Genetic algorithms usually use string or tree representations. We have developed a novel crossover operator for a directed and undirected graph representation, and used this operator to evolve molecules and circuits. Unlike strings or trees, a single point in the representation cannot divide every possible graph into two parts, because graphs may contain cycles. Thus, the crossover operator is non-trivial. A steady-state, tournament selection genetic algorithm code (JavaGenes) was written to implement and test the graph crossover operator. All runs were executed by cycle-scavagging on networked workstations using the Condor batch processing system. The JavaGenes code has evolved pharmaceutical drug molecules and simple digital circuits. Results to date suggest that JavaGenes can evolve moderate sized drug molecules and very small circuits in reasonable time. The algorithm has greater difficulty with somewhat larger circuits, suggesting that directed graphs (circuits) are more difficult to evolve than undirected graphs (molecules), although necessary differences in the crossover operator may also explain the results. In principle, JavaGenes should be able to evolve other graph-representable systems, such as transportation networks, metabolic pathways, and computer networks. However, large graphs evolve significantly slower than smaller graphs, presumably because the space-of-all-graphs explodes combinatorially with graph size. Since the representation strongly affects genetic algorithm performance, adding graphs to the evolutionary programmer's bag-of-tricks should be beneficial. Also, since graph evolution operates directly on the phenotype, the genotype-phenotype translation step, common in genetic algorithm work, is eliminated.
Graphing trillions of triangles
Burkhardt, Paul
2016-01-01
The increasing size of Big Data is often heralded but how data are transformed and represented is also profoundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has been placed on the scale of the input graph but the product of a graph algorithm can be many times larger than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scalable graph exploration for Big Graphs requires new approaches to algorithms, architectures, and visual analytics. A brief tutorial is given to aid the argument for thoughtful representation of data in the context of graph analysis. Then a new algebraic method to reduce the arithmetic operations in counting and listing triangles in graphs is introduced. Additionally, a scalable triangle listing algorithm in the MapReduce model will be presented followed by a description of the experiments with that algorithm that led to the current largest and fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph exploration technologies is proposed. PMID:28690426
Topological properties of the limited penetrable horizontal visibility graph family
NASA Astrophysics Data System (ADS)
Wang, Minggang; Vilela, André L. M.; Du, Ruijin; Zhao, Longfeng; Dong, Gaogao; Tian, Lixin; Stanley, H. Eugene
2018-05-01
The limited penetrable horizontal visibility graph algorithm was recently introduced to map time series in complex networks. In this work, we extend this algorithm to create a directed-limited penetrable horizontal visibility graph and an image-limited penetrable horizontal visibility graph. We define two algorithms and provide theoretical results on the topological properties of these graphs associated with different types of real-value series. We perform several numerical simulations to check the accuracy of our theoretical results. Finally, we present an application of the directed-limited penetrable horizontal visibility graph to measure real-value time series irreversibility and an application of the image-limited penetrable horizontal visibility graph that discriminates noise from chaos. We also propose a method to measure the systematic risk using the image-limited penetrable horizontal visibility graph, and the empirical results show the effectiveness of our proposed algorithms.
MIMO: an efficient tool for molecular interaction maps overlap
2013-01-01
Background Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps. Results Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used to highlight the contact point between various human pathways in the Reactome database. Conclusions MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways. PMID:23672344
Automatic Data Distribution for CFD Applications on Structured Grids
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Yan, Jerry
2000-01-01
Data distribution is an important step in implementation of any parallel algorithm. The data distribution determines data traffic, utilization of the interconnection network and affects the overall code efficiency. In recent years a number data distribution methods have been developed and used in real programs for improving data traffic. We use some of the methods for translating data dependence and affinity relations into data distribution directives. We describe an automatic data alignment and placement tool (ADAFT) which implements these methods and show it results for some CFD codes (NPB and ARC3D). Algorithms for program analysis and derivation of data distribution implemented in ADAFT are efficient three pass algorithms. Most algorithms have linear complexity with the exception of some graph algorithms having complexity O(n(sup 4)) in the worst case.
Automatic Data Distribution for CFD Applications on Structured Grids
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Yan, Jerry
1999-01-01
Data distribution is an important step in implementation of any parallel algorithm. The data distribution determines data traffic, utilization of the interconnection network and affects the overall code efficiency. In recent years a number data distribution methods have been developed and used in real programs for improving data traffic. We use some of the methods for translating data dependence and affinity relations into data distribution directives. We describe an automatic data alignment and placement tool (ADAPT) which implements these methods and show it results for some CFD codes (NPB and ARC3D). Algorithms for program analysis and derivation of data distribution implemented in ADAPT are efficient three pass algorithms. Most algorithms have linear complexity with the exception of some graph algorithms having complexity O(n(sup 4)) in the worst case.
Solution to the SLAM problem in low dynamic environments using a pose graph and an RGB-D sensor.
Lee, Donghwa; Myung, Hyun
2014-07-11
In this study, we propose a solution to the simultaneous localization and mapping (SLAM) problem in low dynamic environments by using a pose graph and an RGB-D (red-green-blue depth) sensor. The low dynamic environments refer to situations in which the positions of objects change over long intervals. Therefore, in the low dynamic environments, robots have difficulty recognizing the repositioning of objects unlike in highly dynamic environments in which relatively fast-moving objects can be detected using a variety of moving object detection algorithms. The changes in the environments then cause groups of false loop closing when the same moved objects are observed for a while, which means that conventional SLAM algorithms produce incorrect results. To address this problem, we propose a novel SLAM method that handles low dynamic environments. The proposed method uses a pose graph structure and an RGB-D sensor. First, to prune the falsely grouped constraints efficiently, nodes of the graph, that represent robot poses, are grouped according to the grouping rules with noise covariances. Next, false constraints of the pose graph are pruned according to an error metric based on the grouped nodes. The pose graph structure is reoptimized after eliminating the false information, and the corrected localization and mapping results are obtained. The performance of the method was validated in real experiments using a mobile robot system.
Combinatorial-topological framework for the analysis of global dynamics.
Bush, Justin; Gameiro, Marcio; Harker, Shaun; Kokubu, Hiroshi; Mischaikow, Konstantin; Obayashi, Ippei; Pilarczyk, Paweł
2012-12-01
We discuss an algorithmic framework based on efficient graph algorithms and algebraic-topological computational tools. The framework is aimed at automatic computation of a database of global dynamics of a given m-parameter semidynamical system with discrete time on a bounded subset of the n-dimensional phase space. We introduce the mathematical background, which is based upon Conley's topological approach to dynamics, describe the algorithms for the analysis of the dynamics using rectangular grids both in phase space and parameter space, and show two sample applications.
Combinatorial-topological framework for the analysis of global dynamics
NASA Astrophysics Data System (ADS)
Bush, Justin; Gameiro, Marcio; Harker, Shaun; Kokubu, Hiroshi; Mischaikow, Konstantin; Obayashi, Ippei; Pilarczyk, Paweł
2012-12-01
We discuss an algorithmic framework based on efficient graph algorithms and algebraic-topological computational tools. The framework is aimed at automatic computation of a database of global dynamics of a given m-parameter semidynamical system with discrete time on a bounded subset of the n-dimensional phase space. We introduce the mathematical background, which is based upon Conley's topological approach to dynamics, describe the algorithms for the analysis of the dynamics using rectangular grids both in phase space and parameter space, and show two sample applications.
Decomposition Algorithm for Global Reachability on a Time-Varying Graph
NASA Technical Reports Server (NTRS)
Kuwata, Yoshiaki
2010-01-01
A decomposition algorithm has been developed for global reachability analysis on a space-time grid. By exploiting the upper block-triangular structure, the planning problem is decomposed into smaller subproblems, which is much more scalable than the original approach. Recent studies have proposed the use of a hot-air (Montgolfier) balloon for possible exploration of Titan and Venus because these bodies have thick haze or cloud layers that limit the science return from an orbiter, and the atmospheres would provide enough buoyancy for balloons. One of the important questions that needs to be addressed is what surface locations the balloon can reach from an initial location, and how long it would take. This is referred to as the global reachability problem, where the paths from starting locations to all possible target locations must be computed. The balloon could be driven with its own actuation, but its actuation capability is fairly limited. It would be more efficient to take advantage of the wind field and ride the wind that is much stronger than what the actuator could produce. It is possible to pose the path planning problem as a graph search problem on a directed graph by discretizing the spacetime world and the vehicle actuation. The decomposition algorithm provides reachability analysis of a time-varying graph. Because the balloon only moves in the positive direction in time, the adjacency matrix of the graph can be represented with an upper block-triangular matrix, and this upper block-triangular structure can be exploited to decompose a large graph search problem. The new approach consumes a much smaller amount of memory, which also helps speed up the overall computation when the computing resource has a limited physical memory compared to the problem size.
Moya, Nikolas; Falcão, Alexandre X; Ciesielski, Krzysztof C; Udupa, Jayaram K
2014-01-01
Graph-cut algorithms have been extensively investigated for interactive binary segmentation, when the simultaneous delineation of multiple objects can save considerable user's time. We present an algorithm (named DRIFT) for 3D multiple object segmentation based on seed voxels and Differential Image Foresting Transforms (DIFTs) with relaxation. DRIFT stands behind efficient implementations of some state-of-the-art methods. The user can add/remove markers (seed voxels) along a sequence of executions of the DRIFT algorithm to improve segmentation. Its first execution takes linear time with the image's size, while the subsequent executions for corrections take sublinear time in practice. At each execution, DRIFT first runs the DIFT algorithm, then it applies diffusion filtering to smooth boundaries between objects (and background) and, finally, it corrects possible objects' disconnection occurrences with respect to their seeds. We evaluate DRIFT in 3D CT-images of the thorax for segmenting the arterial system, esophagus, left pleural cavity, right pleural cavity, trachea and bronchi, and the venous system.
Belief propagation decoding of quantum channels by passing quantum messages
NASA Astrophysics Data System (ADS)
Renes, Joseph M.
2017-07-01
The belief propagation (BP) algorithm is a powerful tool in a wide range of disciplines from statistical physics to machine learning to computational biology, and is ubiquitous in decoding classical error-correcting codes. The algorithm works by passing messages between nodes of the factor graph associated with the code and enables efficient decoding of the channel, in some cases even up to the Shannon capacity. Here we construct the first BP algorithm which passes quantum messages on the factor graph and is capable of decoding the classical-quantum channel with pure state outputs. This gives explicit decoding circuits whose number of gates is quadratic in the code length. We also show that this decoder can be modified to work with polar codes for the pure state channel and as part of a decoder for transmitting quantum information over the amplitude damping channel. These represent the first explicit capacity-achieving decoders for non-Pauli channels.
Building occupancy simulation and data assimilation using a graph-based agent-oriented model
NASA Astrophysics Data System (ADS)
Rai, Sanish; Hu, Xiaolin
2018-07-01
Building occupancy simulation and estimation simulates the dynamics of occupants and estimates their real-time spatial distribution in a building. It requires a simulation model and an algorithm for data assimilation that assimilates real-time sensor data into the simulation model. Existing building occupancy simulation models include agent-based models and graph-based models. The agent-based models suffer high computation cost for simulating large numbers of occupants, and graph-based models overlook the heterogeneity and detailed behaviors of individuals. Recognizing the limitations of existing models, this paper presents a new graph-based agent-oriented model which can efficiently simulate large numbers of occupants in various kinds of building structures. To support real-time occupancy dynamics estimation, a data assimilation framework based on Sequential Monte Carlo Methods is also developed and applied to the graph-based agent-oriented model to assimilate real-time sensor data. Experimental results show the effectiveness of the developed model and the data assimilation framework. The major contributions of this work are to provide an efficient model for building occupancy simulation that can accommodate large numbers of occupants and an effective data assimilation framework that can provide real-time estimations of building occupancy from sensor data.
Optimizing Approximate Weighted Matching on Nvidia Kepler K40
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh
Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set ofmore » $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $$100\\times$$ for over $100$ instances, and from $100$ to $$1000\\times$$ for $15$ instances. We also demonstrate up to $$20\\times$$ speedup relative to $2$ threads, and up to $$5\\times$$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.« less
Big Data Analytics with Datalog Queries on Spark.
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2016-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics.
Big Data Analytics with Datalog Queries on Spark
Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo
2017-01-01
There is great interest in exploiting the opportunity provided by cloud computing platforms for large-scale analytics. Among these platforms, Apache Spark is growing in popularity for machine learning and graph analytics. Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX). Our BigDatalog system addresses the problem by providing concise declarative specification of complex queries amenable to efficient evaluation. Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. We perform an experimental comparison with other state-of-the-art large-scale Datalog systems and verify the efficacy of our techniques and effectiveness of Spark in supporting Datalog-based analytics. PMID:28626296
Graph embedding and extensions: a general framework for dimensionality reduction.
Yan, Shuicheng; Xu, Dong; Zhang, Benyu; Zhang, Hong-Jiang; Yang, Qiang; Lin, Stephen
2007-01-01
Over the past few decades, a large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different solutions to the problem of dimensionality reduction. Despite the different motivations of these algorithms, we present in this paper a general formulation known as graph embedding to unify them within a common framework. In graph embedding, each algorithm can be considered as the direct graph embedding or its linear/kernel/tensor extension of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph that characterizes a statistical or geometric property that should be avoided. Furthermore, the graph embedding framework can be used as a general platform for developing new dimensionality reduction algorithms. By utilizing this framework as a tool, we propose a new supervised dimensionality reduction algorithm called Marginal Fisher Analysis in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizes the interclass separability. We show that MFA effectively overcomes the limitations of the traditional Linear Discriminant Analysis algorithm due to data distribution assumptions and available projection directions. Real face recognition experiments show the superiority of our proposed MFA in comparison to LDA, also for corresponding kernel and tensor extensions.
A Graph Summarization Algorithm Based on RFID Logistics
NASA Astrophysics Data System (ADS)
Sun, Yan; Hu, Kongfa; Lu, Zhipeng; Zhao, Li; Chen, Ling
Radio Frequency Identification (RFID) applications are set to play an essential role in object tracking and supply chain management systems. The volume of data generated by a typical RFID application will be enormous as each item will generate a complete history of all the individual locations that it occupied at every point in time. The movement trails of such RFID data form gigantic commodity flowgraph representing the locations and durations of the path stages traversed by each item. In this paper, we use graph to construct a warehouse of RFID commodity flows, and introduce a database-style operation to summarize graphs, which produces a summary graph by grouping nodes based on user-selected node attributes, further allows users to control the hierarchy of summaries. It can cut down the size of graphs, and provide convenience for users to study just on the shrunk graph which they interested. Through extensive experiments, we demonstrate the effectiveness and efficiency of the proposed method.
Modification of Prim’s algorithm on complete broadcasting graph
NASA Astrophysics Data System (ADS)
Dairina; Arif, Salmawaty; Munzir, Said; Halfiani, Vera; Ramli, Marwan
2017-09-01
Broadcasting is an information dissemination from one object to another object through communication between two objects in a network. Broadcasting for n objects can be solved by n - 1 communications and minimum time unit defined by ⌈2log n⌉ In this paper, weighted graph broadcasting is considered. The minimum weight of a complete broadcasting graph will be determined. Broadcasting graph is said to be complete if every vertex is connected. Thus to determine the minimum weight of complete broadcasting graph is equivalent to determine the minimum spanning tree of a complete graph. The Kruskal’s and Prim’s algorithm will be used to determine the minimum weight of a complete broadcasting graph regardless the minimum time unit ⌈2log n⌉ and modified Prim’s algorithm for the problems of the minimum time unit ⌈2log n⌉ is done. As an example case, here, the training of trainer problem is solved using these algorithms.
Preliminary Evaluation of a Diagnostic Tool for Prosthetics
2017-10-01
volume change. Processing algorithms for data from the activity monitors were modified to run more efficiently so that large datasets could be...left) and blade style prostheses (right). Figure 4: Ankle ActiGraph correct position demonstrated for a left leg below-knee amputee cylindrical
An Energy-Efficient Target-Tracking Strategy for Mobile Sensor Networks.
Mahboubi, Hamid; Masoudimansour, Walid; Aghdam, Amir G; Sayrafian-Pour, Kamran
2017-02-01
In this paper, an energy-efficient strategy is proposed for tracking a moving target in an environment with obstacles, using a network of mobile sensors. Typically, the most dominant sources of energy consumption in a mobile sensor network are sensing, communication, and movement. The proposed algorithm first divides the field into a grid of sufficiently small cells. The grid is then represented by a graph whose edges are properly weighted to reflect the energy consumption of sensors. The proposed technique searches for near-optimal locations for the sensors in different time instants to route information from the target to destination, using a shortest path algorithm. Simulations confirm the efficacy of the proposed algorithm.
Counting in Lattices: Combinatorial Problems from Statistical Mechanics.
NASA Astrophysics Data System (ADS)
Randall, Dana Jill
In this thesis we consider two classical combinatorial problems arising in statistical mechanics: counting matchings and self-avoiding walks in lattice graphs. The first problem arises in the study of the thermodynamical properties of monomers and dimers (diatomic molecules) in crystals. Fisher, Kasteleyn and Temperley discovered an elegant technique to exactly count the number of perfect matchings in two dimensional lattices, but it is not applicable for matchings of arbitrary size, or in higher dimensional lattices. We present the first efficient approximation algorithm for computing the number of matchings of any size in any periodic lattice in arbitrary dimension. The algorithm is based on Monte Carlo simulation of a suitable Markov chain and has rigorously derived performance guarantees that do not rely on any assumptions. In addition, we show that these results generalize to counting matchings in any graph which is the Cayley graph of a finite group. The second problem is counting self-avoiding walks in lattices. This problem arises in the study of the thermodynamics of long polymer chains in dilute solution. While there are a number of Monte Carlo algorithms used to count self -avoiding walks in practice, these are heuristic and their correctness relies on unproven conjectures. In contrast, we present an efficient algorithm which relies on a single, widely-believed conjecture that is simpler than preceding assumptions and, more importantly, is one which the algorithm itself can test. Thus our algorithm is reliable, in the sense that it either outputs answers that are guaranteed, with high probability, to be correct, or finds a counterexample to the conjecture. In either case we know we can trust our results and the algorithm is guaranteed to run in polynomial time. This is the first algorithm for counting self-avoiding walks in which the error bounds are rigorously controlled. This work was supported in part by an AT&T graduate fellowship, a University of California dissertation year fellowship and Esprit working group "RAND". Part of this work was done while visiting ICSI and the University of Edinburgh.
Liu, Ruolin; Dickerson, Julie
2017-11-01
We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.
INDDGO: Integrated Network Decomposition & Dynamic programming for Graph Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groer, Christopher S; Sullivan, Blair D; Weerapurage, Dinesh P
2012-10-01
It is well-known that dynamic programming algorithms can utilize tree decompositions to provide a way to solve some \\emph{NP}-hard problems on graphs where the complexity is polynomial in the number of nodes and edges in the graph, but exponential in the width of the underlying tree decomposition. However, there has been relatively little computational work done to determine the practical utility of such dynamic programming algorithms. We have developed software to construct tree decompositions using various heuristics and have created a fast, memory-efficient dynamic programming implementation for solving maximum weighted independent set. We describe our software and the algorithms wemore » have implemented, focusing on memory saving techniques for the dynamic programming. We compare the running time and memory usage of our implementation with other techniques for solving maximum weighted independent set, including a commercial integer programming solver and a semi-definite programming solver. Our results indicate that it is possible to solve some instances where the underlying decomposition has width much larger than suggested by the literature. For certain types of problems, our dynamic programming code runs several times faster than these other methods.« less
FGRAAL: FORTRAN extended graph algorithmic language
NASA Technical Reports Server (NTRS)
Basili, V. R.; Mesztenyi, C. K.; Rheinboldt, W. C.
1972-01-01
The FORTRAN version FGRAAL of the graph algorithmic language GRAAL as it has been implemented for the Univac 1108 is described. FBRAAL is an extension of FORTRAN 5 and is intended for describing and implementing graph algorithms of the type primarily arising in applications. The formal description contained in this report represents a supplement to the FORTRAN 5 manual for the Univac 1108 (UP-4060), that is, only the new features of the language are described. Several typical graph algorithms, written in FGRAAL, are included to illustrate various features of the language and to show its applicability.
Scaling Semantic Graph Databases in Size and Performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Villa, Oreste
In this paper we present SGEM, a full software system for accelerating large-scale semantic graph databases on commodity clusters. Unlike current approaches, SGEM addresses semantic graph databases by only employing graph methods at all the levels of the stack. On one hand, this allows exploiting the space efficiency of graph data structures and the inherent parallelism of graph algorithms. These features adapt well to the increasing system memory and core counts of modern commodity clusters. On the other hand, however, these systems are optimized for regular computation and batched data transfers, while graph methods usually are irregular and generate fine-grainedmore » data accesses with poor spatial and temporal locality. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods and a custom, multithreaded runtime system. We introduce our stack, motivate its advantages with respect to other solutions and show how we solved the challenges posed by irregular behaviors. We present the result of our software stack on the Berlin SPARQL benchmarks with datasets up to 10 billion triples (a triple corresponds to a graph edge), demonstrating scaling in dataset size and in performance as more nodes are added to the cluster.« less
Weighted graph cuts without eigenvectors a multilevel approach.
Dhillon, Inderjit S; Guan, Yuqiang; Kulis, Brian
2007-11-01
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods--in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast, high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods, such as Metis, have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis and gene network analysis.
Image processing meta-algorithm development via genetic manipulation of existing algorithm graphs
NASA Astrophysics Data System (ADS)
Schalkoff, Robert J.; Shaaban, Khaled M.
1999-07-01
Automatic algorithm generation for image processing applications is not a new idea, however previous work is either restricted to morphological operates or impractical. In this paper, we show recent research result in the development and use of meta-algorithms, i.e. algorithms which lead to new algorithms. Although the concept is generally applicable, the application domain in this work is restricted to image processing. The meta-algorithm concept described in this paper is based upon out work in dynamic algorithm. The paper first present the concept of dynamic algorithms which, on the basis of training and archived algorithmic experience embedded in an algorithm graph (AG), dynamically adjust the sequence of operations applied to the input image data. Each node in the tree-based representation of a dynamic algorithm with out degree greater than 2 is a decision node. At these nodes, the algorithm examines the input data and determines which path will most likely achieve the desired results. This is currently done using nearest-neighbor classification. The details of this implementation are shown. The constrained perturbation of existing algorithm graphs, coupled with a suitable search strategy, is one mechanism to achieve meta-algorithm an doffers rich potential for the discovery of new algorithms. In our work, a meta-algorithm autonomously generates new dynamic algorithm graphs via genetic recombination of existing algorithm graphs. The AG representation is well suited to this genetic-like perturbation, using a commonly- employed technique in artificial neural network synthesis, namely the blueprint representation of graphs. A number of exam. One of the principal limitations of our current approach is the need for significant human input in the learning phase. Efforts to overcome this limitation are discussed. Future research directions are indicated.
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems. PMID:25143977
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.
Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks
NASA Astrophysics Data System (ADS)
Cui, Yaozu; Wang, Xingyuan; Eustace, Justine
2014-12-01
Community structure is a common phenomenon in complex networks, and it has been shown that some communities in complex networks often overlap each other. So in this paper we propose a new algorithm to detect overlapping community structure in complex networks. To identify the overlapping community structure, our algorithm firstly extracts fully connected sub-graphs which are maximal sub-graphs from original networks. Then two maximal sub-graphs having the key pair-vertices can be merged into a new larger sub-graph using some belonging degree functions. Furthermore we extend the modularity function to evaluate the proposed algorithm. In addition, overlapping nodes between communities are founded successfully. Finally we report the comparison between the modularity and the computational complexity of the proposed algorithm with some other existing algorithms. The experimental results show that the proposed algorithm gives satisfactory results.
Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita
2016-04-01
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
Wei, Wei; Gao, Bin; Liu, Tie-Yan; Wang, Taifeng; Li, Guohui; Li, Hang
2016-04-01
Graph-based ranking has been extensively studied and frequently applied in many applications, such as webpage ranking. It aims at mining potentially valuable information from the raw graph-structured data. Recently, with the proliferation of rich heterogeneous information (e.g., node/edge features and prior knowledge) available in many real-world graphs, how to effectively and efficiently leverage all information to improve the ranking performance becomes a new challenging problem. Previous methods only utilize part of such information and attempt to rank graph nodes according to link-based methods, of which the ranking performances are severely affected by several well-known issues, e.g., over-fitting or high computational complexity, especially when the scale of graph is very large. In this paper, we address the large-scale graph-based ranking problem and focus on how to effectively exploit rich heterogeneous information of the graph to improve the ranking performance. Specifically, we propose an innovative and effective semi-supervised PageRank (SSP) approach to parameterize the derived information within a unified semi-supervised learning framework (SSLF-GR), then simultaneously optimize the parameters and the ranking scores of graph nodes. Experiments on the real-world large-scale graphs demonstrate that our method significantly outperforms the algorithms that consider such graph information only partially.
Model checking for linear temporal logic: An efficient implementation
NASA Technical Reports Server (NTRS)
Sherman, Rivi; Pnueli, Amir
1990-01-01
This report provides evidence to support the claim that model checking for linear temporal logic (LTL) is practically efficient. Two implementations of a linear temporal logic model checker is described. One is based on transforming the model checking problem into a satisfiability problem; the other checks an LTL formula for a finite model by computing the cross-product of the finite state transition graph of the program with a structure containing all possible models for the property. An experiment was done with a set of mutual exclusion algorithms and tested safety and liveness under fairness for these algorithms.
Insertion algorithms for network model database management systems
NASA Astrophysics Data System (ADS)
Mamadolimov, Abdurashid; Khikmat, Saburov
2017-12-01
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, forms partial order. When a database is large and a query comparison is expensive then the efficiency requirement of managing algorithms is minimizing the number of query comparisons. We consider updating operation for network model database management systems. We develop a new sequantial algorithm for updating operation. Also we suggest a distributed version of the algorithm.
Monkey search algorithm for ECE components partitioning
NASA Astrophysics Data System (ADS)
Kuliev, Elmar; Kureichik, Vladimir; Kureichik, Vladimir, Jr.
2018-05-01
The paper considers one of the important design problems – a partitioning of electronic computer equipment (ECE) components (blocks). It belongs to the NP-hard class of problems and has a combinatorial and logic nature. In the paper, a partitioning problem formulation can be found as a partition of graph into parts. To solve the given problem, the authors suggest using a bioinspired approach based on a monkey search algorithm. Based on the developed software, computational experiments were carried out that show the algorithm efficiency, as well as its recommended settings for obtaining more effective solutions in comparison with a genetic algorithm.
Fast Inbound Top-K Query for Random Walk with Restart.
Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei
2015-09-01
Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.
Online graphic symbol recognition using neural network and ARG matching
NASA Astrophysics Data System (ADS)
Yang, Bing; Li, Changhua; Xie, Weixing
2001-09-01
This paper proposes a novel method for on-line recognition of line-based graphic symbol. The input strokes are usually warped into a cursive form due to the sundry drawing style, and classifying them is very difficult. To deal with this, an ART-2 neural network is used to classify the input strokes. It has the advantages of high recognition rate, less recognition time and forming classes in a self-organized manner. The symbol recognition is achieved by an Attribute Relational Graph (ARG) matching algorithm. The ARG is very efficient for representing complex objects, but computation cost is very high. To over come this, we suggest a fast graph matching algorithm using symbol structure information. The experimental results show that the proposed method is effective for recognition of symbols with hierarchical structure.
Paving the Way Towards Reactive Planar Spanner Construction in Wireless Networks
NASA Astrophysics Data System (ADS)
Frey, Hannes; Rührup, Stefan
A spanner is a subgraph of a given graph that supports the original graph's shortest path lengths up to a constant factor. Planar spanners and their distributed construction are of particular interest for geographic routing, which is an efficient localized routing scheme for wireless ad hoc and sensor networks. Planarity of the network graph is a key criterion for guaranteed delivery, while the spanner property supports efficiency in terms of path length. We consider the problem of reactive local spanner construction, where a node's local topology is determined on demand. Known message-efficient reactive planarization algorithms do not preserve the spanner property, while reactive spanner constructions with a low message overhead have not been described so far. We introduce the concept of direct planarization which may be an enabler of efficient reactive spanner construction. Given an edge, nodes check for all incident intersecting edges a certain geometric criterion and withdraw the edge if this criterion is not satisfied. We use this concept to derive a generic reactive topology control mechanism and consider two geometric criteria. Simulation results show that direct planarization increases the performance of localized geographic routing by providing shorter paths than existing reactive approaches.
Discrete geometric analysis of message passing algorithm on graphs
NASA Astrophysics Data System (ADS)
Watanabe, Yusuke
2010-04-01
We often encounter probability distributions given as unnormalized products of non-negative functions. The factorization structures are represented by hypergraphs called factor graphs. Such distributions appear in various fields, including statistics, artificial intelligence, statistical physics, error correcting codes, etc. Given such a distribution, computations of marginal distributions and the normalization constant are often required. However, they are computationally intractable because of their computational costs. One successful approximation method is Loopy Belief Propagation (LBP) algorithm. The focus of this thesis is an analysis of the LBP algorithm. If the factor graph is a tree, i.e. having no cycle, the algorithm gives the exact quantities. If the factor graph has cycles, however, the LBP algorithm does not give exact results and possibly exhibits oscillatory and non-convergent behaviors. The thematic question of this thesis is "How the behaviors of the LBP algorithm are affected by the discrete geometry of the factor graph?" The primary contribution of this thesis is the discovery of a formula that establishes the relation between the LBP, the Bethe free energy and the graph zeta function. This formula provides new techniques for analysis of the LBP algorithm, connecting properties of the graph and of the LBP and the Bethe free energy. We demonstrate applications of the techniques to several problems including (non) convexity of the Bethe free energy, the uniqueness and stability of the LBP fixed point. We also discuss the loop series initiated by Chertkov and Chernyak. The loop series is a subgraph expansion of the normalization constant, or partition function, and reflects the graph geometry. We investigate theoretical natures of the series. Moreover, we show a partial connection between the loop series and the graph zeta function.
MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.
He, Tiantian; Chan, Keith C C
2018-05-01
An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.
EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-01-16
The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution,more » diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'« less
The structured ancestral selection graph and the many-demes limit.
Slade, Paul F; Wakeley, John
2005-02-01
We show that the unstructured ancestral selection graph applies to part of the history of a sample from a population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made the assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.
BFL: a node and edge betweenness based fast layout algorithm for large scale networks
Hashimoto, Tatsunori B; Nagasaki, Masao; Kojima, Kaname; Miyano, Satoru
2009-01-01
Background Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements. Results To overcome these problems, we propose the use of a biologically important graph metric, betweenness, a measure of network flow. This metric is highly correlated with many biological phenomena such as lethality and clusters. We devise a new fast parallel algorithm calculating betweenness to minimize the preprocessing cost. Using this metric, we also invent a node and edge betweenness based fast layout algorithm (BFL). BFL places the high-betweenness nodes to optimal positions and allows the low-betweenness nodes to reach suboptimal positions. Furthermore, BFL reduces the runtime by combining a sequential insertion algorim with betweenness. For a graph with n nodes, this approach reduces the expected runtime of the algorithm to O(n2) when considering edge crossings, and to O(n log n) when considering only density and edge lengths. Conclusion Our BFL algorithm is compared against fast graph layout algorithms and approaches requiring intensive optimizations. For gene networks, we show that our algorithm is faster than all layout algorithms tested while providing readability on par with intensive optimization algorithms. We achieve a 1.4 second runtime for a graph with 4000 nodes and 12000 edges on a standard desktop computer. PMID:19146673
Redundancy management for efficient fault recovery in NASA's distributed computing system
NASA Technical Reports Server (NTRS)
Malek, Miroslaw; Pandya, Mihir; Yau, Kitty
1991-01-01
The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance.
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
Learning locality preserving graph from data.
Zhang, Yan-Ming; Huang, Kaizhu; Hou, Xinwen; Liu, Cheng-Lin
2014-11-01
Machine learning based on graph representation, or manifold learning, has attracted great interest in recent years. As the discrete approximation of data manifold, the graph plays a crucial role in these kinds of learning approaches. In this paper, we propose a novel learning method for graph construction, which is distinct from previous methods in that it solves an optimization problem with the aim of directly preserving the local information of the original data set. We show that the proposed objective has close connections with the popular Laplacian Eigenmap problem, and is hence well justified. The optimization turns out to be a quadratic programming problem with n(n-1)/2 variables (n is the number of data points). Exploiting the sparsity of the graph, we further propose a more efficient cutting plane algorithm to solve the problem, making the method better scalable in practice. In the context of clustering and semi-supervised learning, we demonstrated the advantages of our proposed method by experiments.
Sequential visibility-graph motifs
NASA Astrophysics Data System (ADS)
Iacovacci, Jacopo; Lacasa, Lucas
2016-04-01
Visibility algorithms transform time series into graphs and encode dynamical information in their topology, paving the way for graph-theoretical time series analysis as well as building a bridge between nonlinear dynamics and network science. In this work we introduce and study the concept of sequential visibility-graph motifs, smaller substructures of n consecutive nodes that appear with characteristic frequencies. We develop a theory to compute in an exact way the motif profiles associated with general classes of deterministic and stochastic dynamics. We find that this simple property is indeed a highly informative and computationally efficient feature capable of distinguishing among different dynamics and robust against noise contamination. We finally confirm that it can be used in practice to perform unsupervised learning, by extracting motif profiles from experimental heart-rate series and being able, accordingly, to disentangle meditative from other relaxation states. Applications of this general theory include the automatic classification and description of physical, biological, and financial time series.
Use of graph algorithms in the processing and analysis of images with focus on the biomedical data.
Zdimalova, M; Roznovjak, R; Weismann, P; El Falougy, H; Kubikova, E
2017-01-01
Image segmentation is a known problem in the field of image processing. A great number of methods based on different approaches to this issue was created. One of these approaches utilizes the findings of the graph theory. Our work focuses on segmentation using shortest paths in a graph. Specifically, we deal with methods of "Intelligent Scissors," which use Dijkstra's algorithm to find the shortest paths. We created a new software in Microsoft Visual Studio 2013 integrated development environment Visual C++ in the language C++/CLI. We created a format application with a graphical users development environment for system Windows, with using the platform .Net (version 4.5). The program was used for handling and processing the original medical data. The major disadvantage of the method of "Intelligent Scissors" is the computational time length of Dijkstra's algorithm. However, after the implementation of a more efficient priority queue, this problem could be alleviated. The main advantage of this method we see in training that enables to adapt to a particular kind of edge, which we need to segment. The user involvement has a significant influence on the process of segmentation, which enormously aids to achieve high-quality results (Fig. 7, Ref. 13).
Performance Analysis of Evolutionary Algorithms for Steiner Tree Problems.
Lai, Xinsheng; Zhou, Yuren; Xia, Xiaoyun; Zhang, Qingfu
2017-01-01
The Steiner tree problem (STP) aims to determine some Steiner nodes such that the minimum spanning tree over these Steiner nodes and a given set of special nodes has the minimum weight, which is NP-hard. STP includes several important cases. The Steiner tree problem in graphs (GSTP) is one of them. Many heuristics have been proposed for STP, and some of them have proved to be performance guarantee approximation algorithms for this problem. Since evolutionary algorithms (EAs) are general and popular randomized heuristics, it is significant to investigate the performance of EAs for STP. Several empirical investigations have shown that EAs are efficient for STP. However, up to now, there is no theoretical work on the performance of EAs for STP. In this article, we reveal that the (1+1) EA achieves 3/2-approximation ratio for STP in a special class of quasi-bipartite graphs in expected runtime [Formula: see text], where [Formula: see text], [Formula: see text], and [Formula: see text] are, respectively, the number of Steiner nodes, the number of special nodes, and the largest weight among all edges in the input graph. We also show that the (1+1) EA is better than two other heuristics on two GSTP instances, and the (1+1) EA may be inefficient on a constructed GSTP instance.
Collaborative mining and transfer learning for relational data
NASA Astrophysics Data System (ADS)
Levchuk, Georgiy; Eslami, Mohammed
2015-06-01
Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.
NASA Astrophysics Data System (ADS)
Ziemann, Amanda K.; Messinger, David W.; Albano, James A.; Basener, William F.
2012-06-01
Anomaly detection algorithms have historically been applied to hyperspectral imagery in order to identify pixels whose material content is incongruous with the background material in the scene. Typically, the application involves extracting man-made objects from natural and agricultural surroundings. A large challenge in designing these algorithms is determining which pixels initially constitute the background material within an image. The topological anomaly detection (TAD) algorithm constructs a graph theory-based, fully non-parametric topological model of the background in the image scene, and uses codensity to measure deviation from this background. In TAD, the initial graph theory structure of the image data is created by connecting an edge between any two pixel vertices x and y if the Euclidean distance between them is less than some resolution r. While this type of proximity graph is among the most well-known approaches to building a geometric graph based on a given set of data, there is a wide variety of dierent geometrically-based techniques. In this paper, we present a comparative test of the performance of TAD across four dierent constructs of the initial graph: mutual k-nearest neighbor graph, sigma-local graph for two different values of σ > 1, and the proximity graph originally implemented in TAD.
featsel: A framework for benchmarking of feature selection algorithms and cost functions
NASA Astrophysics Data System (ADS)
Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior
In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
Superpixel-based graph cuts for accurate stereo matching
NASA Astrophysics Data System (ADS)
Feng, Liting; Qin, Kaihuai
2017-06-01
Estimating the surface normal vector and disparity of a pixel simultaneously, also known as three-dimensional label method, has been widely used in recent continuous stereo matching problem to achieve sub-pixel accuracy. However, due to the infinite label space, it’s extremely hard to assign each pixel an appropriate label. In this paper, we present an accurate and efficient algorithm, integrating patchmatch with graph cuts, to approach this critical computational problem. Besides, to get robust and precise matching cost, we use a convolutional neural network to learn a similarity measure on small image patches. Compared with other MRF related methods, our method has several advantages: its sub-modular property ensures a sub-problem optimality which is easy to perform in parallel; graph cuts can simultaneously update multiple pixels, avoiding local minima caused by sequential optimizers like belief propagation; it uses segmentation results for better local expansion move; local propagation and randomization can easily generate the initial solution without using external methods. Middlebury experiments show that our method can get higher accuracy than other MRF-based algorithms.
GraphMeta: Managing HPC Rich Metadata in Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Dong; Chen, Yong; Carns, Philip
High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
Collaborative mining of graph patterns from multiple sources
NASA Astrophysics Data System (ADS)
Levchuk, Georgiy; Colonna-Romanoa, John
2016-05-01
Intelligence analysts require automated tools to mine multi-source data, including answering queries, learning patterns of life, and discovering malicious or anomalous activities. Graph mining algorithms have recently attracted significant attention in intelligence community, because the text-derived knowledge can be efficiently represented as graphs of entities and relationships. However, graph mining models are limited to use-cases involving collocated data, and often make restrictive assumptions about the types of patterns that need to be discovered, the relationships between individual sources, and availability of accurate data segmentation. In this paper we present a model to learn the graph patterns from multiple relational data sources, when each source might have only a fragment (or subgraph) of the knowledge that needs to be discovered, and segmentation of data into training or testing instances is not available. Our model is based on distributed collaborative graph learning, and is effective in situations when the data is kept locally and cannot be moved to a centralized location. Our experiments show that proposed collaborative learning achieves learning quality better than aggregated centralized graph learning, and has learning time comparable to traditional distributed learning in which a knowledge of data segmentation is needed.
Faster Parameterized Algorithms for Minor Containment
NASA Astrophysics Data System (ADS)
Adler, Isolde; Dorn, Frederic; Fomin, Fedor V.; Sau, Ignasi; Thilikos, Dimitrios M.
The theory of Graph Minors by Robertson and Seymour is one of the deepest and significant theories in modern Combinatorics. This theory has also a strong impact on the recent development of Algorithms, and several areas, like Parameterized Complexity, have roots in Graph Minors. Until very recently it was a common belief that Graph Minors Theory is mainly of theoretical importance. However, it appears that many deep results from Robertson and Seymour's theory can be also used in the design of practical algorithms. Minor containment testing is one of algorithmically most important and technical parts of the theory, and minor containment in graphs of bounded branchwidth is a basic ingredient of this algorithm. In order to implement minor containment testing on graphs of bounded branchwidth, Hicks [NETWORKS 04] described an algorithm, that in time O(3^{k^2}\\cdot (h+k-1)!\\cdot m) decides if a graph G with m edges and branchwidth k, contains a fixed graph H on h vertices as a minor. That algorithm follows the ideas introduced by Robertson and Seymour in [J'CTSB 95]. In this work we improve the dependence on k of Hicks' result by showing that checking if H is a minor of G can be done in time O(2^{(2k +1 )\\cdot log k} \\cdot h^{2k} \\cdot 2^{2h^2} \\cdot m). Our approach is based on a combinatorial object called rooted packing, which captures the properties of the potential models of subgraphs of H that we seek in our dynamic programming algorithm. This formulation with rooted packings allows us to speed up the algorithm when G is embedded in a fixed surface, obtaining the first single-exponential algorithm for minor containment testing. Namely, it runs in time 2^{O(k)} \\cdot h^{2k} \\cdot 2^{O(h)} \\cdot n, with n = |V(G)|. Finally, we show that slight modifications of our algorithm permit to solve some related problems within the same time bounds, like induced minor or contraction minor containment.
Graph-based optimization of epitope coverage for vaccine antigen design
Theiler, James Patrick; Korber, Bette Tina Marie
2017-01-29
Epigraph is a recently developed algorithm that enables the computationally efficient design of single or multi-antigen vaccines to maximize the potential epitope coverage for a diverse pathogen population. Potential epitopes are defined as short contiguous stretches of proteins, comparable in length to T-cell epitopes. This optimal coverage problem can be formulated in terms of a directed graph, with candidate antigens represented as paths that traverse this graph. Epigraph protein sequences can also be used as the basis for designing peptides for experimental evaluation of immune responses in natural infections to highly variable proteins. The epigraph tool suite also enables rapidmore » characterization of populations of diverse sequences from an immunological perspective. Fundamental distance measures are based on immunologically relevant shared potential epitope frequencies, rather than simple Hamming or phylogenetic distances. Here, we provide a mathematical description of the epigraph algorithm, include a comparison of different heuristics that can be used when graphs are not acyclic, and we describe an additional tool we have added to the web-based epigraph tool suite that provides frequency summaries of all distinct potential epitopes in a population. Lastly, we also show examples of the graphical output and summary tables that can be generated using the epigraph tool suite and explain their content and applications.« less
Graph-based optimization of epitope coverage for vaccine antigen design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, James Patrick; Korber, Bette Tina Marie
Epigraph is a recently developed algorithm that enables the computationally efficient design of single or multi-antigen vaccines to maximize the potential epitope coverage for a diverse pathogen population. Potential epitopes are defined as short contiguous stretches of proteins, comparable in length to T-cell epitopes. This optimal coverage problem can be formulated in terms of a directed graph, with candidate antigens represented as paths that traverse this graph. Epigraph protein sequences can also be used as the basis for designing peptides for experimental evaluation of immune responses in natural infections to highly variable proteins. The epigraph tool suite also enables rapidmore » characterization of populations of diverse sequences from an immunological perspective. Fundamental distance measures are based on immunologically relevant shared potential epitope frequencies, rather than simple Hamming or phylogenetic distances. Here, we provide a mathematical description of the epigraph algorithm, include a comparison of different heuristics that can be used when graphs are not acyclic, and we describe an additional tool we have added to the web-based epigraph tool suite that provides frequency summaries of all distinct potential epitopes in a population. Lastly, we also show examples of the graphical output and summary tables that can be generated using the epigraph tool suite and explain their content and applications.« less
Bounded-Degree Approximations of Stochastic Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quinn, Christopher J.; Pinar, Ali; Kiyavash, Negar
2017-06-01
We propose algorithms to approximate directed information graphs. Directed information graphs are probabilistic graphical models that depict causal dependencies between stochastic processes in a network. The proposed algorithms identify optimal and near-optimal approximations in terms of Kullback-Leibler divergence. The user-chosen sparsity trades off the quality of the approximation against visual conciseness and computational tractability. One class of approximations contains graphs with speci ed in-degrees. Another class additionally requires that the graph is connected. For both classes, we propose algorithms to identify the optimal approximations and also near-optimal approximations, using a novel relaxation of submodularity. We also propose algorithms to identifymore » the r-best approximations among these classes, enabling robust decision making.« less
Learning to Predict Combinatorial Structures
NASA Astrophysics Data System (ADS)
Vembu, Shankar
2009-12-01
The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
Ababneh, Sufyan Y; Prescott, Jeff W; Gurcan, Metin N
2011-08-01
In this paper, a new, fully automated, content-based system is proposed for knee bone segmentation from magnetic resonance images (MRI). The purpose of the bone segmentation is to support the discovery and characterization of imaging biomarkers for the incidence and progression of osteoarthritis, a debilitating joint disease, which affects a large portion of the aging population. The segmentation algorithm includes a novel content-based, two-pass disjoint block discovery mechanism, which is designed to support automation, segmentation initialization, and post-processing. The block discovery is achieved by classifying the image content to bone and background blocks according to their similarity to the categories in the training data collected from typical bone structures. The classified blocks are then used to design an efficient graph-cut based segmentation algorithm. This algorithm requires constructing a graph using image pixel data followed by applying a maximum-flow algorithm which generates a minimum graph-cut that corresponds to an initial image segmentation. Content-based refinements and morphological operations are then applied to obtain the final segmentation. The proposed segmentation technique does not require any user interaction and can distinguish between bone and highly similar adjacent structures, such as fat tissues with high accuracy. The performance of the proposed system is evaluated by testing it on 376 MR images from the Osteoarthritis Initiative (OAI) database. This database included a selection of single images containing the femur and tibia from 200 subjects with varying levels of osteoarthritis severity. Additionally, a full three-dimensional segmentation of the bones from ten subjects with 14 slices each, and synthetic images with background having intensity and spatial characteristics similar to those of bone are used to assess the robustness and consistency of the developed algorithm. The results show an automatic bone detection rate of 0.99 and an average segmentation accuracy of 0.95 using the Dice similarity index. Copyright © 2011 Elsevier B.V. All rights reserved.
Linear Time Algorithms to Restrict Insider Access using Multi-Policy Access Control Systems
Mell, Peter; Shook, James; Harang, Richard; Gavrila, Serban
2017-01-01
An important way to limit malicious insiders from distributing sensitive information is to as tightly as possible limit their access to information. This has always been the goal of access control mechanisms, but individual approaches have been shown to be inadequate. Ensemble approaches of multiple methods instantiated simultaneously have been shown to more tightly restrict access, but approaches to do so have had limited scalability (resulting in exponential calculations in some cases). In this work, we take the Next Generation Access Control (NGAC) approach standardized by the American National Standards Institute (ANSI) and demonstrate its scalability. The existing publicly available reference implementations all use cubic algorithms and thus NGAC was widely viewed as not scalable. The primary NGAC reference implementation took, for example, several minutes to simply display the set of files accessible to a user on a moderately sized system. In our approach, we take these cubic algorithms and make them linear. We do this by reformulating the set theoretic approach of the NGAC standard into a graph theoretic approach and then apply standard graph algorithms. We thus can answer important access control decision questions (e.g., which files are available to a user and which users can access a file) using linear time graph algorithms. We also provide a default linear time mechanism to visualize and review user access rights for an ensemble of access control mechanisms. Our visualization appears to be a simple file directory hierarchy but in reality is an automatically generated structure abstracted from the underlying access control graph that works with any set of simultaneously instantiated access control policies. It also provide an implicit mechanism for symbolic linking that provides a powerful access capability. Our work thus provides the first efficient implementation of NGAC while enabling user privilege review through a novel visualization approach. This may help transition from concept to reality the idea of using ensembles of simultaneously instantiated access control methodologies, thereby limiting insider threat. PMID:28758045
Interprocedural Analysis and the Verification of Concurrent Programs
2009-01-01
SSPE ) problem is to compute a regular expression that represents paths(s, v) for all vertices v in the graph. The syntax of regular expressions is as...follows: r ::= ∅ | ε | e | r1 ∪ r2 | r1.r2 | r∗, where e stands for an edge in G. We can use any algorithm for SSPE to compute regular expressions for...a closed representation of loops provides an exponential speedup.2 Tarjan’s path-expression algorithm solves the SSPE problem efficiently. It uses
A Graph-Algorithmic Approach for the Study of Metastability in Markov Chains
NASA Astrophysics Data System (ADS)
Gan, Tingyue; Cameron, Maria
2017-06-01
Large continuous-time Markov chains with exponentially small transition rates arise in modeling complex systems in physics, chemistry, and biology. We propose a constructive graph-algorithmic approach to determine the sequence of critical timescales at which the qualitative behavior of a given Markov chain changes, and give an effective description of the dynamics on each of them. This approach is valid for both time-reversible and time-irreversible Markov processes, with or without symmetry. Central to this approach are two graph algorithms, Algorithm 1 and Algorithm 2, for obtaining the sequences of the critical timescales and the hierarchies of Typical Transition Graphs or T-graphs indicating the most likely transitions in the system without and with symmetry, respectively. The sequence of critical timescales includes the subsequence of the reciprocals of the real parts of eigenvalues. Under a certain assumption, we prove sharp asymptotic estimates for eigenvalues (including pre-factors) and show how one can extract them from the output of Algorithm 1. We discuss the relationship between Algorithms 1 and 2 and explain how one needs to interpret the output of Algorithm 1 if it is applied in the case with symmetry instead of Algorithm 2. Finally, we analyze an example motivated by R. D. Astumian's model of the dynamics of kinesin, a molecular motor, by means of Algorithm 2.
One-dimensional swarm algorithm packaging
NASA Astrophysics Data System (ADS)
Lebedev, Boris K.; Lebedev, Oleg B.; Lebedeva, Ekaterina O.
2018-05-01
The paper considers an algorithm for solving the problem of onedimensional packaging based on the adaptive behavior model of an ant colony. The key role in the development of the ant algorithm is the choice of representation (interpretation) of the solution. The structure of the solution search graph, the procedure for finding solutions on the graph, the methods of deposition and evaporation of pheromone are described. Unlike the canonical paradigm of an ant algorithm, an ant on the solution search graph generates sets of elements distributed across blocks. Experimental studies were conducted on IBM PC. Compared with the existing algorithms, the results are improved.
Dynamic extension of the Simulation Problem Analysis Kernel (SPANK)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sowell, E.F.; Buhl, W.F.
1988-07-15
The Simulation Problem Analysis Kernel (SPANK) is an object-oriented simulation environment for general simulation purposes. Among its unique features is use of the directed graph as the primary data structure, rather than the matrix. This allows straightforward use of graph algorithms for matching variables and equations, and reducing the problem graph for efficient numerical solution. The original prototype implementation demonstrated the principles for systems of algebraic equations, allowing simulation of steady-state, nonlinear systems (Sowell 1986). This paper describes how the same principles can be extended to include dynamic objects, allowing simulation of general dynamic systems. The theory is developed andmore » an implementation is described. An example is taken from the field of building energy system simulation. 2 refs., 9 figs.« less
Graph theory as a proxy for spatially explicit population models in conservation planning.
Minor, Emily S; Urban, Dean L
2007-09-01
Spatially explicit population models (SEPMs) are often considered the best way to predict and manage species distributions in spatially heterogeneous landscapes. However, they are computationally intensive and require extensive knowledge of species' biology and behavior, limiting their application in many cases. An alternative to SEPMs is graph theory, which has minimal data requirements and efficient algorithms. Although only recently introduced to landscape ecology, graph theory is well suited to ecological applications concerned with connectivity or movement. This paper compares the performance of graph theory to a SEPM in selecting important habitat patches for Wood Thrush (Hylocichla mustelina) conservation. We use both models to identify habitat patches that act as population sources and persistent patches and also use graph theory to identify patches that act as stepping stones for dispersal. Correlations of patch rankings were very high between the two models. In addition, graph theory offers the ability to identify patches that are very important to habitat connectivity and thus long-term population persistence across the landscape. We show that graph theory makes very similar predictions in most cases and in other cases offers insight not available from the SEPM, and we conclude that graph theory is a suitable and possibly preferable alternative to SEPMs for species conservation in heterogeneous landscapes.
Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.
Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less
He, Chenlong; Feng, Zuren; Ren, Zhigang
2018-01-01
In this paper, we propose a connectivity-preserving flocking algorithm for multi-agent systems in which the neighbor set of each agent is determined by the hybrid metric-topological distance so that the interaction topology can be represented as the range-limited Delaunay graph, which combines the properties of the commonly used disk graph and Delaunay graph. As a result, the proposed flocking algorithm has the following advantages over the existing ones. First, range-limited Delaunay graph is sparser than the disk graph so that the information exchange among agents is reduced significantly. Second, some links irrelevant to the connectivity can be dynamically deleted during the evolution of the system. Thus, the proposed flocking algorithm is more flexible than existing algorithms, where links are not allowed to be disconnected once they are created. Finally, the multi-agent system spontaneously generates a regular quasi-lattice formation without imposing the constraint on the ratio of the sensing range of the agent to the desired distance between two adjacent agents. With the interaction topology induced by the hybrid distance, the proposed flocking algorithm can still be implemented in a distributed manner. We prove that the proposed flocking algorithm can steer the multi-agent system to a stable flocking motion, provided the initial interaction topology of multi-agent systems is connected and the hysteresis in link addition is smaller than a derived upper bound. The correctness and effectiveness of the proposed algorithm are verified by extensive numerical simulations, where the flocking algorithms based on the disk and Delaunay graph are compared.
Feng, Zuren; Ren, Zhigang
2018-01-01
In this paper, we propose a connectivity-preserving flocking algorithm for multi-agent systems in which the neighbor set of each agent is determined by the hybrid metric-topological distance so that the interaction topology can be represented as the range-limited Delaunay graph, which combines the properties of the commonly used disk graph and Delaunay graph. As a result, the proposed flocking algorithm has the following advantages over the existing ones. First, range-limited Delaunay graph is sparser than the disk graph so that the information exchange among agents is reduced significantly. Second, some links irrelevant to the connectivity can be dynamically deleted during the evolution of the system. Thus, the proposed flocking algorithm is more flexible than existing algorithms, where links are not allowed to be disconnected once they are created. Finally, the multi-agent system spontaneously generates a regular quasi-lattice formation without imposing the constraint on the ratio of the sensing range of the agent to the desired distance between two adjacent agents. With the interaction topology induced by the hybrid distance, the proposed flocking algorithm can still be implemented in a distributed manner. We prove that the proposed flocking algorithm can steer the multi-agent system to a stable flocking motion, provided the initial interaction topology of multi-agent systems is connected and the hysteresis in link addition is smaller than a derived upper bound. The correctness and effectiveness of the proposed algorithm are verified by extensive numerical simulations, where the flocking algorithms based on the disk and Delaunay graph are compared. PMID:29462217
Elmetwaly, Shereef; Schlick, Tamar
2014-01-01
Graph representations have been widely used to analyze and design various economic, social, military, political, and biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize mathematical properties of the Laplacian eigenvector (µ2) corresponding to the second eigenvalues (λ2) associated with the topology matrix defining the graph: λ2 describes the overall topology, and the sum of µ2′s components is zero. The three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, and largest gap of µ2′s components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA structures up through 11 vertices (∼220 nucleotides). While we observe that the median cut divides a graph into two similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs. PMID:25188578
Graph Mining Meets the Semantic Web
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan
The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less
Optimal Learning Paths in Information Networks
Rodi, G. C.; Loreto, V.; Servedio, V. D. P.; Tria, F.
2015-01-01
Each sphere of knowledge and information could be depicted as a complex mesh of correlated items. By properly exploiting these connections, innovative and more efficient navigation strategies could be defined, possibly leading to a faster learning process and an enduring retention of information. In this work we investigate how the topological structure embedding the items to be learned can affect the efficiency of the learning dynamics. To this end we introduce a general class of algorithms that simulate the exploration of knowledge/information networks standing on well-established findings on educational scheduling, namely the spacing and lag effects. While constructing their learning schedules, individuals move along connections, periodically revisiting some concepts, and sometimes jumping on very distant ones. In order to investigate the effect of networked information structures on the proposed learning dynamics we focused both on synthetic and real-world graphs such as subsections of Wikipedia and word-association graphs. We highlight the existence of optimal topological structures for the simulated learning dynamics whose efficiency is affected by the balance between hubs and the least connected items. Interestingly, the real-world graphs we considered lead naturally to almost optimal learning performances. PMID:26030508
Yan, Kang K; Zhao, Hongyu; Pang, Herbert
2017-12-06
High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.
Families of Graph Algorithms: SSSP Case Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila Jay; Zalewski, Marcin J.; Lumsdaine, Andrew
2017-08-28
Single-Source Shortest Paths (SSSP) is a well-studied graph problem. Examples of SSSP algorithms include the original Dijkstra’s algorithm and the parallel Δ-stepping and KLA-SSSP algorithms. In this paper, we use a novel Abstract Graph Machine (AGM) model to show that all these algorithms share a common logic and differ from one another by the order in which they perform work. We use the AGM model to thoroughly analyze the family of algorithms that arises from the common logic. We start with the basic algorithm without any ordering (Chaotic), and then we derive the existing and new algorithms by methodically exploringmore » semantic and spatial ordering of work. Our experimental results show that new derived algorithms show better performance than the existing distributed memory parallel algorithms, especially at higher scales.« less
Partitioning sparse matrices with eigenvectors of graphs
NASA Technical Reports Server (NTRS)
Pothen, Alex; Simon, Horst D.; Liou, Kang-Pu
1990-01-01
The problem of computing a small vertex separator in a graph arises in the context of computing a good ordering for the parallel factorization of sparse, symmetric matrices. An algebraic approach for computing vertex separators is considered in this paper. It is shown that lower bounds on separator sizes can be obtained in terms of the eigenvalues of the Laplacian matrix associated with a graph. The Laplacian eigenvectors of grid graphs can be computed from Kronecker products involving the eigenvectors of path graphs, and these eigenvectors can be used to compute good separators in grid graphs. A heuristic algorithm is designed to compute a vertex separator in a general graph by first computing an edge separator in the graph from an eigenvector of the Laplacian matrix, and then using a maximum matching in a subgraph to compute the vertex separator. Results on the quality of the separators computed by the spectral algorithm are presented, and these are compared with separators obtained from other algorithms for computing separators. Finally, the time required to compute the Laplacian eigenvector is reported, and the accuracy with which the eigenvector must be computed to obtain good separators is considered. The spectral algorithm has the advantage that it can be implemented on a medium-size multiprocessor in a straightforward manner.
Connectivity algorithm with depth first search (DFS) on simple graphs
NASA Astrophysics Data System (ADS)
Riansanti, O.; Ihsan, M.; Suhaimi, D.
2018-01-01
This paper discusses an algorithm to detect connectivity of a simple graph using Depth First Search (DFS). The DFS implementation in this paper differs than other research, that is, on counting the number of visited vertices. The algorithm obtains s from the number of vertices and visits source vertex, following by its adjacent vertices until the last vertex adjacent to the previous source vertex. Any simple graph is connected if s equals 0 and disconnected if s is greater than 0. The complexity of the algorithm is O(n2).
Stavrakas, Vassilis; Melas, Ioannis N; Sakellaropoulos, Theodore; Alexopoulos, Leonidas G
2015-01-01
Modeling of signal transduction pathways is instrumental for understanding cells' function. People have been tackling modeling of signaling pathways in order to accurately represent the signaling events inside cells' biochemical microenvironment in a way meaningful for scientists in a biological field. In this article, we propose a method to interrogate such pathways in order to produce cell-specific signaling models. We integrate available prior knowledge of protein connectivity, in a form of a Prior Knowledge Network (PKN) with phosphoproteomic data to construct predictive models of the protein connectivity of the interrogated cell type. Several computational methodologies focusing on pathways' logic modeling using optimization formulations or machine learning algorithms have been published on this front over the past few years. Here, we introduce a light and fast approach that uses a breadth-first traversal of the graph to identify the shortest pathways and score proteins in the PKN, fitting the dependencies extracted from the experimental design. The pathways are then combined through a heuristic formulation to produce a final topology handling inconsistencies between the PKN and the experimental scenarios. Our results show that the algorithm we developed is efficient and accurate for the construction of medium and large scale signaling networks. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGF/TNFA stimulation against made up experimental data. To avoid the possibility of erroneous predictions, we performed a cross-validation analysis. Finally, we validate that the introduced approach generates predictive topologies, comparable to the ILP formulation. Overall, an efficient approach based on graph theory is presented herein to interrogate protein-protein interaction networks and to provide meaningful biological insights.
Graph-cut based discrete-valued image reconstruction.
Tuysuzoglu, Ahmet; Karl, W Clem; Stojanovic, Ivana; Castañòn, David; Ünlü, M Selim
2015-05-01
Efficient graph-cut methods have been used with great success for labeling and denoising problems occurring in computer vision. Unfortunately, the presence of linear image mappings has prevented the use of these techniques in most discrete-amplitude image reconstruction problems. In this paper, we develop a graph-cut based framework for the direct solution of discrete amplitude linear image reconstruction problems cast as regularized energy function minimizations. We first analyze the structure of discrete linear inverse problem cost functions to show that the obstacle to the application of graph-cut methods to their solution is the variable mixing caused by the presence of the linear sensing operator. We then propose to use a surrogate energy functional that overcomes the challenges imposed by the sensing operator yet can be utilized efficiently in existing graph-cut frameworks. We use this surrogate energy functional to devise a monotonic iterative algorithm for the solution of discrete valued inverse problems. We first provide experiments using local convolutional operators and show the robustness of the proposed technique to noise and stability to changes in regularization parameter. Then we focus on nonlocal, tomographic examples where we consider limited-angle data problems. We compare our technique with state-of-the-art discrete and continuous image reconstruction techniques. Experiments show that the proposed method outperforms state-of-the-art techniques in challenging scenarios involving discrete valued unknowns.
Querying graphs in protein-protein interactions networks using feedback vertex set.
Blin, Guillaume; Sikora, Florian; Vialette, Stéphane
2010-01-01
Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.
NASA Astrophysics Data System (ADS)
Mali, P.; Mukhopadhyay, A.; Manna, S. K.; Haldar, P. K.; Singh, G.
2017-03-01
Horizontal visibility graphs (HVGs) and the sandbox (SB) algorithm usually applied for multifractal characterization of complex network systems that are converted from time series measurements, are used to characterize the fluctuations in pseudorapidity densities of singly charged particles produced in high-energy nucleus-nucleus collisions. Besides obtaining the degree distribution associated with event-wise pseudorapidity distributions, the common set of observables, typical of any multifractality measurement, are studied in 16O-Ag/Br and 32S-Ag/Br interactions, each at an incident laboratory energy of 200 GeV/nucleon. For a better understanding, we systematically compare the experiment with a Monte Carlo model simulation based on the Ultra-relativistic Quantum Molecular Dynamics (UrQMD). Our results suggest that the HVG-SB technique is an efficient tool that can characterize multifractality in multiparticle emission data, and in some cases, it is even superior to other methods more commonly used in this regard.
Real-time path planning in dynamic virtual environments using multiagent navigation graphs.
Sud, Avneesh; Andersen, Erik; Curtis, Sean; Lin, Ming C; Manocha, Dinesh
2008-01-01
We present a novel approach for efficient path planning and navigation of multiple virtual agents in complex dynamic scenes. We introduce a new data structure, Multi-agent Navigation Graph (MaNG), which is constructed using first- and second-order Voronoi diagrams. The MaNG is used to perform route planning and proximity computations for each agent in real time. Moreover, we use the path information and proximity relationships for local dynamics computation of each agent by extending a social force model [Helbing05]. We compute the MaNG using graphics hardware and present culling techniques to accelerate the computation. We also address undersampling issues and present techniques to improve the accuracy of our algorithm. Our algorithm is used for real-time multi-agent planning in pursuit-evasion, terrain exploration and crowd simulation scenarios consisting of hundreds of moving agents, each with a distinct goal.
Enhancing Team Composition in Professional Networks: Problem Definitions and Fast Solutions
Li, Liangyue; Tong, Hanghang; Cao, Nan; Ehrlich, Kate; Lin, Yu-Ru; Buchler, Norbou
2017-01-01
In this paper, we study ways to enhance the composition of teams based on new requirements in a collaborative environment. We focus on recommending team members who can maintain the team’s performance by minimizing changes to the team’s skills and social structure. Our recommendations are based on computing team-level similarity, which includes skill similarity, structural similarity as well as the synergy between the two. Current heuristic approaches are one-dimensional and not comprehensive, as they consider the two aspects independently. To formalize team-level similarity, we adopt the notion of graph kernel of attributed graphs to encompass the two aspects and their interaction. To tackle the computational challenges, we propose a family of fast algorithms by (a) designing effective pruning strategies, and (b) exploring the smoothness between the existing and the new team structures. Extensive empirical evaluations on real world datasets validate the effectiveness and efficiency of our algorithms. PMID:29104408
Evaluation of All-Day-Efficiency for selected flat plate and evacuated tube collectors
NASA Technical Reports Server (NTRS)
1981-01-01
An evaluation of all day efficiency for selected flat plate and evacuated tube collectors is presented. Computations are based on a modified version of the NBSIR 78-1305A procedure for all day efficiency. The ASHMET and NOAA data bases for solar insolation are discussed. Details of the algorithm used to convert total (global) horizontal radiation to the collector tilt plane of the selected sites are given along with tables and graphs which show the results of the tests performed during this evaluation.
A genetic graph-based approach for partitional clustering.
Menéndez, Héctor D; Barrero, David F; Camacho, David
2014-05-01
Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
An experimental study of graph connectivity for unsupervised word sense disambiguation.
Navigli, Roberto; Lapata, Mirella
2010-04-01
Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most "important" node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.
The graph neural network model.
Scarselli, Franco; Gori, Marco; Tsoi, Ah Chung; Hagenbuchner, Markus; Monfardini, Gabriele
2009-01-01
Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) is an element of IR(m) that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.
Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho
2014-01-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho
2014-11-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
Inference of Spatio-Temporal Functions Over Graphs via Multikernel Kriged Kalman Filtering
NASA Astrophysics Data System (ADS)
Ioannidis, Vassilis N.; Romero, Daniel; Giannakis, Georgios B.
2018-06-01
Inference of space-time varying signals on graphs emerges naturally in a plethora of network science related applications. A frequently encountered challenge pertains to reconstructing such dynamic processes, given their values over a subset of vertices and time instants. The present paper develops a graph-aware kernel-based kriged Kalman filter that accounts for the spatio-temporal variations, and offers efficient online reconstruction, even for dynamically evolving network topologies. The kernel-based learning framework bypasses the need for statistical information by capitalizing on the smoothness that graph signals exhibit with respect to the underlying graph. To address the challenge of selecting the appropriate kernel, the proposed filter is combined with a multi-kernel selection module. Such a data-driven method selects a kernel attuned to the signal dynamics on-the-fly within the linear span of a pre-selected dictionary. The novel multi-kernel learning algorithm exploits the eigenstructure of Laplacian kernel matrices to reduce computational complexity. Numerical tests with synthetic and real data demonstrate the superior reconstruction performance of the novel approach relative to state-of-the-art alternatives.
Multi-label literature classification based on the Gene Ontology graph.
Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua
2008-12-08
The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
Retina verification system based on biometric graph matching.
Lajevardi, Seyed Mehdi; Arakala, Arathi; Davis, Stephen A; Horadam, Kathy J
2013-09-01
This paper presents an automatic retina verification framework based on the biometric graph matching (BGM) algorithm. The retinal vasculature is extracted using a family of matched filters in the frequency domain and morphological operators. Then, retinal templates are defined as formal spatial graphs derived from the retinal vasculature. The BGM algorithm, a noisy graph matching algorithm, robust to translation, non-linear distortion, and small rotations, is used to compare retinal templates. The BGM algorithm uses graph topology to define three distance measures between a pair of graphs, two of which are new. A support vector machine (SVM) classifier is used to distinguish between genuine and imposter comparisons. Using single as well as multiple graph measures, the classifier achieves complete separation on a training set of images from the VARIA database (60% of the data), equaling the state-of-the-art for retina verification. Because the available data set is small, kernel density estimation (KDE) of the genuine and imposter score distributions of the training set are used to measure performance of the BGM algorithm. In the one dimensional case, the KDE model is validated with the testing set. A 0 EER on testing shows that the KDE model is a good fit for the empirical distribution. For the multiple graph measures, a novel combination of the SVM boundary and the KDE model is used to obtain a fair comparison with the KDE model for the single measure. A clear benefit in using multiple graph measures over a single measure to distinguish genuine and imposter comparisons is demonstrated by a drop in theoretical error of between 60% and more than two orders of magnitude.
GDPC: Gravitation-based Density Peaks Clustering algorithm
NASA Astrophysics Data System (ADS)
Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin
2018-07-01
The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-06-01
meraculous2 is a whole genome shotgun assembler for short-reads that is capable of assembling large, polymorphic genomes with modest computational requirements. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. Additional features include (1) handling of allelic variation using "bubble" structures within the deBruijn graph, (2) gap closing of repetitive and low quality regions using localized assemblies, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising onmore » scaffolding accuracy« less
Quantum speedup of the traveling-salesman problem for bounded-degree graphs
NASA Astrophysics Data System (ADS)
Moylett, Dominic J.; Linden, Noah; Montanaro, Ashley
2017-03-01
The traveling-salesman problem is one of the most famous problems in graph theory. However, little is currently known about the extent to which quantum computers could speed up algorithms for the problem. In this paper, we prove a quadratic quantum speedup when the degree of each vertex is at most 3 by applying a quantum backtracking algorithm to a classical algorithm by Xiao and Nagamochi. We then use similar techniques to accelerate a classical algorithm for when the degree of each vertex is at most 4, before speeding up higher-degree graphs via reductions to these instances.
Linear Algebra and Sequential Importance Sampling for Network Reliability
2011-12-01
first test case is an Erdős- Renyi graph with 100 vertices and 150 edges. Figure 1 depicts the relative variance of the three Algorithms: Algorithm TOP...e va ria nc e Figure 1: Relative variance of various algorithms on Erdős Renyi graph, 100 vertices 250 edges. Key: Solid = TOP-DOWN algorithm
Detection of protein complex from protein-protein interaction network using Markov clustering
NASA Astrophysics Data System (ADS)
Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.
2017-05-01
Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Identifying patients with Alzheimer's disease using resting-state fMRI and graph theory.
Khazaee, Ali; Ebrahimzadeh, Ata; Babajani-Feremi, Abbas
2015-11-01
Study of brain network on the basis of resting-state functional magnetic resonance imaging (fMRI) has provided promising results to investigate changes in connectivity among different brain regions because of diseases. Graph theory can efficiently characterize different aspects of the brain network by calculating measures of integration and segregation. In this study, we combine graph theoretical approaches with advanced machine learning methods to study functional brain network alteration in patients with Alzheimer's disease (AD). Support vector machine (SVM) was used to explore the ability of graph measures in diagnosis of AD. We applied our method on the resting-state fMRI data of twenty patients with AD and twenty age and gender matched healthy subjects. The data were preprocessed and each subject's graph was constructed by parcellation of the whole brain into 90 distinct regions using the automated anatomical labeling (AAL) atlas. The graph measures were then calculated and used as the discriminating features. Extracted network-based features were fed to different feature selection algorithms to choose most significant features. In addition to the machine learning approach, statistical analysis was performed on connectivity matrices to find altered connectivity patterns in patients with AD. Using the selected features, we were able to accurately classify patients with AD from healthy subjects with accuracy of 100%. Results of this study show that pattern recognition and graph of brain network, on the basis of the resting state fMRI data, can efficiently assist in the diagnosis of AD. Classification based on the resting-state fMRI can be used as a non-invasive and automatic tool to diagnosis of Alzheimer's disease. Copyright © 2015 International Federation of Clinical Neurophysiology. All rights reserved.
Melchert, O; Katzgraber, Helmut G; Novotny, M A
2016-04-01
We estimate the critical thresholds of bond and site percolation on nonplanar, effectively two-dimensional graphs with chimeralike topology. The building blocks of these graphs are complete and symmetric bipartite subgraphs of size 2n, referred to as K_{n,n} graphs. For the numerical simulations we use an efficient union-find-based algorithm and employ a finite-size scaling analysis to obtain the critical properties for both bond and site percolation. We report the respective percolation thresholds for different sizes of the bipartite subgraph and verify that the associated universality class is that of standard two-dimensional percolation. For the canonical chimera graph used in the D-Wave Systems Inc. quantum annealer (n=4), we discuss device failure in terms of network vulnerability, i.e., we determine the critical fraction of qubits and couplers that can be absent due to random failures prior to losing large-scale connectivity throughout the device.
Dinh, Hieu; Rajasekaran, Sanguthevar
2011-07-15
Exact-match overlap graphs have been broadly used in the context of DNA assembly and the shortest super string problem where the number of strings n ranges from thousands to billions. The length ℓ of the strings is from 25 to 1000, depending on the DNA sequencing technologies. However, many DNA assemblers using overlap graphs suffer from the need for too much time and space in constructing the graphs. It is nearly impossible for these DNA assemblers to handle the huge amount of data produced by the next-generation sequencing technologies where the number n of strings could be several billions. If the overlap graph is explicitly stored, it would require Ω(n(2)) memory, which could be prohibitive in practice when n is greater than a hundred million. In this article, we propose a novel data structure using which the overlap graph can be compactly stored. This data structure requires only linear time to construct and and linear memory to store. For a given set of input strings (also called reads), we can informally define an exact-match overlap graph as follows. Each read is represented as a node in the graph and there is an edge between two nodes if the corresponding reads overlap sufficiently. A formal description follows. The maximal exact-match overlap of two strings x and y, denoted by ov(max)(x, y), is the longest string which is a suffix of x and a prefix of y. The exact-match overlap graph of n given strings of length ℓ is an edge-weighted graph in which each vertex is associated with a string and there is an edge (x, y) of weight ω=ℓ-|ov(max)(x, y)| if and only if ω ≤ λ, where |ov(max)(x, y)| is the length of ov(max)(x, y) and λ is a given threshold. In this article, we show that the exact-match overlap graphs can be represented by a compact data structure that can be stored using at most (2λ-1)(2⌈logn⌉+⌈logλ⌉)n bits with a guarantee that the basic operation of accessing an edge takes O(log λ) time. We also propose two algorithms for constructing the data structure for the exact-match overlap graph. The first algorithm runs in O(λℓnlogn) worse-case time and requires O(λ) extra memory. The second one runs in O(λℓn) time and requires O(n) extra memory. Our experimental results on a huge amount of simulated data from sequence assembly show that the data structure can be constructed efficiently in time and memory. Our DNA sequence assembler that incorporates the data structure is freely available on the web at http://www.engr.uconn.edu/~htd06001/assembler/leap.zip
A distributed query execution engine of big attributed graphs.
Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif
2016-01-01
A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
Mathematical foundations of the GraphBLAS
Kepner, Jeremy; Aaltonen, Peter; Bader, David; ...
2016-12-01
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This study provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, themore » two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Finally, performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.« less
Computing Strongly Connected Components in the Streaming Model
NASA Astrophysics Data System (ADS)
Laura, Luigi; Santaroni, Federico
In this paper we present the first algorithm to compute the Strongly Connected Components of a graph in the datastream model (W-Stream), where the graph is represented by a stream of edges and we are allowed to produce intermediate output streams. The algorithm is simple, effective, and can be implemented with few lines of code: it looks at each edge in the stream, and selects the appropriate action with respect to a tree T, representing the graph connectivity seen so far. We analyze the theoretical properties of the algorithm: correctness, memory occupation (O(n logn)), per item processing time (bounded by the current height of T), and number of passes (bounded by the maximal height of T). We conclude by presenting a brief experimental evaluation of the algorithm against massive synthetic and real graphs that confirms its effectiveness: with graphs with up to 100M nodes and 4G edges, only few passes are needed, and millions of edges per second are processed.
Real-time energy-saving metro train rescheduling with primary delay identification
Li, Keping; Schonfeld, Paul
2018-01-01
This paper aims to reschedule online metro trains in delay scenarios. A graph representation and a mixed integer programming model are proposed to formulate the optimization problem. The solution approach is a two-stage optimization method. In the first stage, based on a proposed train state graph and system analysis, the primary and flow-on delays are specifically analyzed and identified with a critical path algorithm. For the second stage a hybrid genetic algorithm is designed to optimize the schedule, with the delay identification results as input. Then, based on the infrastructure data of Beijing Subway Line 4 of China, case studies are presented to demonstrate the effectiveness and efficiency of the solution approach. The results show that the algorithm can quickly and accurately identify primary delays among different types of delays. The economic cost of energy consumption and total delay is considerably reduced (by more than 10% in each case). The computation time of the Hybrid-GA is low enough for rescheduling online. Sensitivity analyses further demonstrate that the proposed approach can be used as a decision-making support tool for operators. PMID:29474471
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grossman, Max; Pritchard Jr., Howard Porter; Budimlic, Zoran
2016-12-22
Graph500 [14] is an effort to offer a standardized benchmark across large-scale distributed platforms which captures the behavior of common communicationbound graph algorithms. Graph500 differs from other large-scale benchmarking efforts (such as HPL [6] or HPGMG [7]) primarily in the irregularity of its computation and data access patterns. The core computational kernel of Graph500 is a breadth-first search (BFS) implemented on an undirected graph. The output of Graph500 is a spanning tree of the input graph, usually represented by a predecessor mapping for every node in the graph. The Graph500 benchmark defines several pre-defined input sizes for implementers to testmore » against. This report summarizes investigation into implementing the Graph500 benchmark on OpenSHMEM, and focuses on first building a strong and practical understanding of the strengths and limitations of past work before proposing and developing novel extensions.« less
NASA Astrophysics Data System (ADS)
Komachi, Mamoru; Kudo, Taku; Shimbo, Masashi; Matsumoto, Yuji
Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of Espresso-style bootstrapping has the same root as the topic drift of Kleinberg's HITS, using a simplified graph-based reformulation of bootstrapping. We confirm that two graph-based algorithms, the von Neumann kernels and the regularized Laplacian, can reduce the effect of semantic drift in the task of word sense disambiguation (WSD) on Senseval-3 English Lexical Sample Task. Proposed algorithms achieve superior performance to Espresso and previous graph-based WSD methods, even though the proposed algorithms have less parameters and are easy to calibrate.
A heuristic for efficient data distribution management in distributed simulation
NASA Astrophysics Data System (ADS)
Gupta, Pankaj; Guha, Ratan K.
2005-05-01
In this paper, we propose an algorithm for reducing the complexity of region matching and efficient multicasting in data distribution management component of High Level Architecture (HLA) Run Time Infrastructure (RTI). The current data distribution management (DDM) techniques rely on computing the intersection between the subscription and update regions. When a subscription region and an update region of different federates overlap, RTI establishes communication between the publisher and the subscriber. It subsequently routes the updates from the publisher to the subscriber. The proposed algorithm computes the update/subscription regions matching for dynamic allocation of multicast group. It provides new multicast routines that exploit the connectivity of federation by communicating updates regarding interactions and routes information only to those federates that require them. The region-matching problem in DDM reduces to clique-covering problem using the connections graph abstraction where the federations represent the vertices and the update/subscribe relations represent the edges. We develop an abstract model based on connection graph for data distribution management. Using this abstract model, we propose a heuristic for solving the region-matching problem of DDM. We also provide complexity analysis of the proposed heuristics.
Two-layer symbolic representation for stochastic models with phase-type distributed events
NASA Astrophysics Data System (ADS)
Longo, Francesco; Scarpa, Marco
2015-07-01
Among the techniques that have been proposed for the analysis of non-Markovian models, the state space expansion approach showed great flexibility in terms of modelling capacities.The principal drawback is the explosion of the state space. This paper proposes a two-layer symbolic method for efficiently storing the expanded reachability graph of a non-Markovian model in the case in which continuous phase-type distributions are associated with the firing times of system events, and different memory policies are considered. At the lower layer, the reachability graph is symbolically represented in the form of a set of Kronecker matrices, while, at the higher layer, all the information needed to correctly manage event memory is stored in a multi-terminal multi-valued decision diagram. Such an information is collected by applying a symbolic algorithm, which is based on a couple of theorems. The efficiency of the proposed approach, in terms of memory occupation and execution time, is shown by applying it to a set of non-Markovian stochastic Petri nets and comparing it with a classical explicit expansion algorithm. Moreover, a comparison with a classical symbolic approach is performed whenever possible.
Semantic super networks: A case analysis of Wikipedia papers
NASA Astrophysics Data System (ADS)
Kostyuchenko, Evgeny; Lebedeva, Taisiya; Goritov, Alexander
2017-11-01
An algorithm for constructing super-large semantic networks has been developed in current work. Algorithm was tested using the "Cosmos" category of the Internet encyclopedia "Wikipedia" as an example. During the implementation, a parser for the syntax analysis of Wikipedia pages was developed. A graph based on list of articles and categories was formed. On the basis of the obtained graph analysis, algorithms for finding domains of high connectivity in a graph were proposed and tested. Algorithms for constructing a domain based on the number of links and the number of articles in the current subject area is considered. The shortcomings of these algorithms are shown and explained, an algorithm is developed on their joint use. The possibility of applying a combined algorithm for obtaining the final domain is shown. The problem of instability of the received domain was discovered when starting an algorithm from two neighboring vertices related to the domain.
NASA Astrophysics Data System (ADS)
Zheng, Yan
2015-03-01
Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
Contact replacement for NMR resonance assignment.
Xiong, Fei; Pandurangan, Gopal; Bailey-Kellogg, Chris
2008-07-01
Complementing its traditional role in structural studies of proteins, nuclear magnetic resonance (NMR) spectroscopy is playing an increasingly important role in functional studies. NMR dynamics experiments characterize motions involved in target recognition, ligand binding, etc., while NMR chemical shift perturbation experiments identify and localize protein-protein and protein-ligand interactions. The key bottleneck in these studies is to determine the backbone resonance assignment, which allows spectral peaks to be mapped to specific atoms. This article develops a novel approach to address that bottleneck, exploiting an available X-ray structure or homology model to assign the entire backbone from a set of relatively fast and cheap NMR experiments. We formulate contact replacement for resonance assignment as the problem of computing correspondences between a contact graph representing the structure and an NMR graph representing the data; the NMR graph is a significantly corrupted, ambiguous version of the contact graph. We first show that by combining connectivity and amino acid type information, and exploiting the random structure of the noise, one can provably determine unique correspondences in polynomial time with high probability, even in the presence of significant noise (a constant number of noisy edges per vertex). We then detail an efficient randomized algorithm and show that, over a variety of experimental and synthetic datasets, it is robust to typical levels of structural variation (1-2 AA), noise (250-600%) and missings (10-40%). Our algorithm achieves very good overall assignment accuracy, above 80% in alpha-helices, 70% in beta-sheets and 60% in loop regions. Our contact replacement algorithm is implemented in platform-independent Python code. The software can be freely obtained for academic use by request from the authors.
Overlapping community detection based on link graph using distance dynamics
NASA Astrophysics Data System (ADS)
Chen, Lei; Zhang, Jing; Cai, Li-Jun
2018-01-01
The distance dynamics model was recently proposed to detect the disjoint community of a complex network. To identify the overlapping structure of a network using the distance dynamics model, an overlapping community detection algorithm, called L-Attractor, is proposed in this paper. The process of L-Attractor mainly consists of three phases. In the first phase, L-Attractor transforms the original graph to a link graph (a new edge graph) to assure that one node has multiple distances. In the second phase, using the improved distance dynamics model, a dynamic interaction process is introduced to simulate the distance dynamics (shrink or stretch). Through the dynamic interaction process, all distances converge, and the disjoint community structure of the link graph naturally manifests itself. In the third phase, a recovery method is designed to convert the disjoint community structure of the link graph to the overlapping community structure of the original graph. Extensive experiments are conducted on the LFR benchmark networks as well as real-world networks. Based on the results, our algorithm demonstrates higher accuracy and quality than other state-of-the-art algorithms.
An intelligent allocation algorithm for parallel processing
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.
1988-01-01
The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Kumbhare, Alok; Wickramaarachchi, Charith
2014-08-25
Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines themore » scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.« less
CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks
NASA Astrophysics Data System (ADS)
Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng
2014-12-01
Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.
A distributed-memory approximation algorithm for maximum weight perfect bipartite matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Buluc, Aydin; Li, Xiaoye S.
We design and implement an efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before factorization where sequential implementations of maximum weight perfect matching algorithms, such as those available in MC64, are widely used due to the lack of scalable alternatives. To overcome this limitation, we proposemore » a fully parallel distributed memory algorithm that first generates a perfect matching and then searches for weightaugmenting cycles of length four in parallel and iteratively augments the matching with a vertex disjoint set of such cycles. For most practical problems the weights of the perfect matchings generated by our algorithm are very close to the optimum. An efficient implementation of the algorithm scales up to 256 nodes (17,408 cores) on a Cray XC40 supercomputer and can solve instances that are too large to be handled by a single node using the sequential algorithm.« less
I/O-Efficient Scientific Computation Using TPIE
NASA Technical Reports Server (NTRS)
Vengroff, Darren Erik; Vitter, Jeffrey Scott
1996-01-01
In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.
Connectivity: Performance Portable Algorithms for graph connectivity v. 0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slota, George; Rajamanickam, Sivasankaran; Madduri, Kamesh
Graphs occur in several places in real world from road networks, social networks and scientific simulations. Connectivity is a graph analysis software to graph connectivity in modern architectures like multicore CPUs, Xeon Phi and GPUs.
Highly Asynchronous VisitOr Queue Graph Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pearce, R.
2012-10-01
HAVOQGT is a C++ framework that can be used to create highly parallel graph traversal algorithms. The framework stores the graph and algorithmic data structures on external memory that is typically mapped to high performance locally attached NAND FLASH arrays. The framework supports a vertex-centered visitor programming model. The frameworkd has been used to implement breadth first search, connected components, and single source shortest path.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boman, Erik G.; Catalyurek, Umit V.; Chevalier, Cedric
2015-01-16
This final progress report summarizes the work accomplished at the Combinatorial Scientific Computing and Petascale Simulations Institute. We developed Zoltan, a parallel mesh partitioning library that made use of accurate hypergraph models to provide load balancing in mesh-based computations. We developed several graph coloring algorithms for computing Jacobian and Hessian matrices and organized them into a software package called ColPack. We developed parallel algorithms for graph coloring and graph matching problems, and also designed multi-scale graph algorithms. Three PhD students graduated, six more are continuing their PhD studies, and four postdoctoral scholars were advised. Six of these students and Fellowsmore » have joined DOE Labs (Sandia, Berkeley), as staff scientists or as postdoctoral scientists. We also organized the SIAM Workshop on Combinatorial Scientific Computing (CSC) in 2007, 2009, and 2011 to continue to foster the CSC community.« less
GraDit: graph-based data repair algorithm for multiple data edits rule violations
NASA Astrophysics Data System (ADS)
Ode Zuhayeni Madjida, Wa; Gusti Bagus Baskara Nugraha, I.
2018-03-01
Constraint-based data cleaning captures data violation to a set of rule called data quality rules. The rules consist of integrity constraint and data edits. Structurally, they are similar, where the rule contain left hand side and right hand side. Previous research proposed a data repair algorithm for integrity constraint violation. The algorithm uses undirected hypergraph as rule violation representation. Nevertheless, this algorithm can not be applied for data edits because of different rule characteristics. This study proposed GraDit, a repair algorithm for data edits rule. First, we use bipartite-directed hypergraph as model representation of overall defined rules. These representation is used for getting interaction between violation rules and clean rules. On the other hand, we proposed undirected graph as violation representation. Our experimental study showed that algorithm with undirected graph as violation representation model gave better data quality than algorithm with undirected hypergraph as representation model.
Human connectome module pattern detection using a new multi-graph MinMax cut model.
De, Wang; Wang, Yang; Nie, Feiping; Yan, Jingwen; Cai, Weidong; Saykin, Andrew J; Shen, Li; Huang, Heng
2014-01-01
Many recent scientific efforts have been devoted to constructing the human connectome using Diffusion Tensor Imaging (DTI) data for understanding the large-scale brain networks that underlie higher-level cognition in human. However, suitable computational network analysis tools are still lacking in human connectome research. To address this problem, we propose a novel multi-graph min-max cut model to detect the consistent network modules from the brain connectivity networks of all studied subjects. A new multi-graph MinMax cut model is introduced to solve this challenging computational neuroscience problem and the efficient optimization algorithm is derived. In the identified connectome module patterns, each network module shows similar connectivity patterns in all subjects, which potentially associate to specific brain functions shared by all subjects. We validate our method by analyzing the weighted fiber connectivity networks. The promising empirical results demonstrate the effectiveness of our method.
Graph-based real-time fault diagnostics
NASA Technical Reports Server (NTRS)
Padalkar, S.; Karsai, G.; Sztipanovits, J.
1988-01-01
A real-time fault detection and diagnosis capability is absolutely crucial in the design of large-scale space systems. Some of the existing AI-based fault diagnostic techniques like expert systems and qualitative modelling are frequently ill-suited for this purpose. Expert systems are often inadequately structured, difficult to validate and suffer from knowledge acquisition bottlenecks. Qualitative modelling techniques sometimes generate a large number of failure source alternatives, thus hampering speedy diagnosis. In this paper we present a graph-based technique which is well suited for real-time fault diagnosis, structured knowledge representation and acquisition and testing and validation. A Hierarchical Fault Model of the system to be diagnosed is developed. At each level of hierarchy, there exist fault propagation digraphs denoting causal relations between failure modes of subsystems. The edges of such a digraph are weighted with fault propagation time intervals. Efficient and restartable graph algorithms are used for on-line speedy identification of failure source components.
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs
LeGault, Laura H.; Dewey, Colin N.
2013-01-01
Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23846746
Efficient parallel architecture for highly coupled real-time linear system applications
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo
1988-01-01
A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.
Beyond Hosting Capacity: Using Shortest Path Methods to Minimize Upgrade Cost Pathways: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gensollen, Nicolas; Horowitz, Kelsey A; Palmintier, Bryan S
We present in this paper a graph based forwardlooking algorithm applied to distribution planning in the context of distributed PV penetration. We study the target hosting capacity (THC) problem where the objective is to find the cheapest sequence of system upgrades to reach a predefined hosting capacity target value. We show in this paper that commonly used short-term cost minimization approaches lead most of the time to suboptimal solutions. By comparing our method against such myopic techniques on real distribution systems, we show that our algorithm is able to reduce the overall integration costs by looking at future decisions. Becausemore » hosting capacity is hard to compute, this problem requires efficient methods to search the space. We demonstrate here that heuristics using domain specific knowledge can be efficiently used to improve the algorithm performance such that real distribution systems can be studied.« less
A sampling algorithm for segregation analysis
Tier, Bruce; Henshall, John
2001-01-01
Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated. PMID:11742631
Labeling RDF Graphs for Linear Time and Space Querying
NASA Astrophysics Data System (ADS)
Furche, Tim; Weinzierl, Antonius; Bry, François
Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.
The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database.
Hall, Richard J; Murray, Christopher W; Verdonk, Marcel L
2017-07-27
The hit validation stage of a fragment-based drug discovery campaign involves probing the SAR around one or more fragment hits. This often requires a search for similar compounds in a corporate collection or from commercial suppliers. The Fragment Network is a graph database that allows a user to efficiently search chemical space around a compound of interest. The result set is chemically intuitive, naturally grouped by substitution pattern and meaningfully sorted according to the number of observations of each transformation in medicinal chemistry databases. This paper describes the algorithms used to construct and search the Fragment Network and provides examples of how it may be used in a drug discovery context.
Constructing Dense Graphs with Unique Hamiltonian Cycles
ERIC Educational Resources Information Center
Lynch, Mark A. M.
2012-01-01
It is not difficult to construct dense graphs containing Hamiltonian cycles, but it is difficult to generate dense graphs that are guaranteed to contain a unique Hamiltonian cycle. This article presents an algorithm for generating arbitrarily large simple graphs containing "unique" Hamiltonian cycles. These graphs can be turned into dense graphs…
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Buluc, Aydn; Pothen, Alex
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting
Azad, Ariful; Buluc, Aydn; Pothen, Alex
2016-03-24
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Xuanhua; Luo, Xuan; Liang, Junling
GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less
Yan, Fei; Christmas, William; Kittler, Josef
2008-10-01
In this paper, we propose a multilayered data association scheme with graph-theoretic formulation for tracking multiple objects that undergo switching dynamics in clutter. The proposed scheme takes as input object candidates detected in each frame. At the object candidate level, "tracklets'' are "grown'' from sets of candidates that have high probabilities of containing only true positives. At the tracklet level, a directed and weighted graph is constructed, where each node is a tracklet, and the edge weight between two nodes is defined according to the "compatibility'' of the two tracklets. The association problem is then formulated as an all-pairs shortest path (APSP) problem in this graph. Finally, at the path level, by analyzing the APSPs, all object trajectories are identified, and track initiation and track termination are automatically dealt with. By exploiting a special topological property of the graph, we have also developed a more efficient APSP algorithm than the general-purpose ones. The proposed data association scheme is applied to tennis sequences to track tennis balls. Experiments show that it works well on sequences where other data association methods perform poorly or fail completely.
Probabilistic graphs as a conceptual and computational tool in hydrology and water management
NASA Astrophysics Data System (ADS)
Schoups, Gerrit
2014-05-01
Originally developed in the fields of machine learning and artificial intelligence, probabilistic graphs constitute a general framework for modeling complex systems in the presence of uncertainty. The framework consists of three components: 1. Representation of the model as a graph (or network), with nodes depicting random variables in the model (e.g. parameters, states, etc), which are joined together by factors. Factors are local probabilistic or deterministic relations between subsets of variables, which, when multiplied together, yield the joint distribution over all variables. 2. Consistent use of probability theory for quantifying uncertainty, relying on basic rules of probability for assimilating data into the model and expressing unknown variables as a function of observations (via the posterior distribution). 3. Efficient, distributed approximation of the posterior distribution using general-purpose algorithms that exploit model structure encoded in the graph. These attributes make probabilistic graphs potentially useful as a conceptual and computational tool in hydrology and water management (and beyond). Conceptually, they can provide a common framework for existing and new probabilistic modeling approaches (e.g. by drawing inspiration from other fields of application), while computationally they can make probabilistic inference feasible in larger hydrological models. The presentation explores, via examples, some of these benefits.
A novel line segment detection algorithm based on graph search
NASA Astrophysics Data System (ADS)
Zhao, Hong-dan; Liu, Guo-ying; Song, Xu
2018-02-01
To overcome the problem of extracting line segment from an image, a method of line segment detection was proposed based on the graph search algorithm. After obtaining the edge detection result of the image, the candidate straight line segments are obtained in four directions. For the candidate straight line segments, their adjacency relationships are depicted by a graph model, based on which the depth-first search algorithm is employed to determine how many adjacent line segments need to be merged. Finally we use the least squares method to fit the detected straight lines. The comparative experimental results verify that the proposed algorithm has achieved better results than the line segment detector (LSD).
NASA Astrophysics Data System (ADS)
Sur, Chiranjib; Shukla, Anupam
2018-03-01
Bacteria Foraging Optimisation Algorithm is a collective behaviour-based meta-heuristics searching depending on the social influence of the bacteria co-agents in the search space of the problem. The algorithm faces tremendous hindrance in terms of its application for discrete problems and graph-based problems due to biased mathematical modelling and dynamic structure of the algorithm. This had been the key factor to revive and introduce the discrete form called Discrete Bacteria Foraging Optimisation (DBFO) Algorithm for discrete problems which exceeds the number of continuous domain problems represented by mathematical and numerical equations in real life. In this work, we have mainly simulated a graph-based road multi-objective optimisation problem and have discussed the prospect of its utilisation in other similar optimisation problems and graph-based problems. The various solution representations that can be handled by this DBFO has also been discussed. The implications and dynamics of the various parameters used in the DBFO are illustrated from the point view of the problems and has been a combination of both exploration and exploitation. The result of DBFO has been compared with Ant Colony Optimisation and Intelligent Water Drops Algorithms. Important features of DBFO are that the bacteria agents do not depend on the local heuristic information but estimates new exploration schemes depending upon the previous experience and covered path analysis. This makes the algorithm better in combination generation for graph-based problems and combination generation for NP hard problems.
Combining watershed and graph cuts methods to segment organs at risk in radiotherapy
NASA Astrophysics Data System (ADS)
Dolz, Jose; Kirisli, Hortense A.; Viard, Romain; Massoptier, Laurent
2014-03-01
Computer-aided segmentation of anatomical structures in medical images is a valuable tool for efficient radiation therapy planning (RTP). As delineation errors highly affect the radiation oncology treatment, it is crucial to delineate geometric structures accurately. In this paper, a semi-automatic segmentation approach for computed tomography (CT) images, based on watershed and graph-cuts methods, is presented. The watershed pre-segmentation groups small areas of similar intensities in homogeneous labels, which are subsequently used as input for the graph-cuts algorithm. This methodology does not require of prior knowledge of the structure to be segmented; even so, it performs well with complex shapes and low intensity. The presented method also allows the user to add foreground and background strokes in any of the three standard orthogonal views - axial, sagittal or coronal - making the interaction with the algorithm easy and fast. Hence, the segmentation information is propagated within the whole volume, providing a spatially coherent result. The proposed algorithm has been evaluated using 9 CT volumes, by comparing its segmentation performance over several organs - lungs, liver, spleen, heart and aorta - to those of manual delineation from experts. A Dicés coefficient higher than 0.89 was achieved in every case. That demonstrates that the proposed approach works well for all the anatomical structures analyzed. Due to the quality of the results, the introduction of the proposed approach in the RTP process will be a helpful tool for organs at risk (OARs) segmentation.
Statistically significant relational data mining :
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less
Graphical Language for Data Processing
NASA Technical Reports Server (NTRS)
Alphonso, Keith
2011-01-01
A graphical language for processing data allows processing elements to be connected with virtual wires that represent data flows between processing modules. The processing of complex data, such as lidar data, requires many different algorithms to be applied. The purpose of this innovation is to automate the processing of complex data, such as LIDAR, without the need for complex scripting and programming languages. The system consists of a set of user-interface components that allow the user to drag and drop various algorithmic and processing components onto a process graph. By working graphically, the user can completely visualize the process flow and create complex diagrams. This innovation supports the nesting of graphs, such that a graph can be included in another graph as a single step for processing. In addition to the user interface components, the system includes a set of .NET classes that represent the graph internally. These classes provide the internal system representation of the graphical user interface. The system includes a graph execution component that reads the internal representation of the graph (as described above) and executes that graph. The execution of the graph follows the interpreted model of execution in that each node is traversed and executed from the original internal representation. In addition, there are components that allow external code elements, such as algorithms, to be easily integrated into the system, thus making the system infinitely expandable.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Survey of gene splicing algorithms based on reads.
Si, Xiuhua; Wang, Qian; Zhang, Lei; Wu, Ruo; Ma, Jiquan
2017-11-02
Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.
Lifted worm algorithm for the Ising model
NASA Astrophysics Data System (ADS)
Elçi, Eren Metin; Grimm, Jens; Ding, Lijie; Nasrawi, Abrahim; Garoni, Timothy M.; Deng, Youjin
2018-04-01
We design an irreversible worm algorithm for the zero-field ferromagnetic Ising model by using the lifting technique. We study the dynamic critical behavior of an energylike observable on both the complete graph and toroidal grids, and compare our findings with reversible algorithms such as the Prokof'ev-Svistunov worm algorithm. Our results show that the lifted worm algorithm improves the dynamic exponent of the energylike observable on the complete graph and leads to a significant constant improvement on toroidal grids.
Approximate inference on planar graphs using loop calculus and belief progagation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chertkov, Michael; Gomez, Vicenc; Kappen, Hilbert
We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006b) allows to express the exact partition function Z of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyzemore » in detail both the loop series and the Pfaffian series for models with binary variables and pairwise interactions, and show that the first term of the Pfaffian series can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.« less
Optimal processor assignment for pipeline computations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath
1991-01-01
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.
Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F
2018-03-01
Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wylie, Brian Neil; Moreland, Kenneth D.
Graphs are a vital way of organizing data with complex correlations. A good visualization of a graph can fundamentally change human understanding of the data. Consequently, there is a rich body of work on graph visualization. Although there are many techniques that are effective on small to medium sized graphs (tens of thousands of nodes), there is a void in the research for visualizing massive graphs containing millions of nodes. Sandia is one of the few entities in the world that has the means and motivation to handle data on such a massive scale. For example, homeland security generates graphsmore » from prolific media sources such as television, telephone, and the Internet. The purpose of this project is to provide the groundwork for visualizing such massive graphs. The research provides for two major feature gaps: a parallel, interactive visualization framework and scalable algorithms to make the framework usable to a practical application. Both the frameworks and algorithms are designed to run on distributed parallel computers, which are already available at Sandia. Some features are integrated into the ThreatView{trademark} application and future work will integrate further parallel algorithms.« less
The combination of direct and paired link graphs can boost repetitive genome assembly
Shi, Wenyu; Ji, Peifeng
2017-01-01
Abstract Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions. PMID:27924003
Typical performance of approximation algorithms for NP-hard problems
NASA Astrophysics Data System (ADS)
Takabe, Satoshi; Hukushima, Koji
2016-11-01
Typical performance of approximation algorithms is studied for randomized minimum vertex cover problems. A wide class of random graph ensembles characterized by an arbitrary degree distribution is discussed with the presentation of a theoretical framework. Herein, three approximation algorithms are examined: linear-programming relaxation, loopy-belief propagation, and the leaf-removal algorithm. The former two algorithms are analyzed using a statistical-mechanical technique, whereas the average-case analysis of the last one is conducted using the generating function method. These algorithms have a threshold in the typical performance with increasing average degree of the random graph, below which they find true optimal solutions with high probability. Our study reveals that there exist only three cases, determined by the order of the typical performance thresholds. In addition, we provide some conditions for classification of the graph ensembles and demonstrate explicitly some examples for the difference in thresholds.
Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold
Nijkamp, Jurgen F.; Pop, Mihai; Reinders, Marcel J. T.; de Ridder, Dick
2013-01-01
Motivation: Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes. Results: We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation. We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets. Availability: MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/software Contact: d.deridder@tudelft.nl PMID:24058058
Information-optimal genome assembly via sparse read-overlap graphs.
Shomorony, Ilan; Kim, Samuel H; Courtade, Thomas A; Tse, David N C
2016-09-01
In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence? Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50. Available at github.com/samhykim/nsg courtade@eecs.berkeley.edu or dntse@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Convergence of the Graph Allen-Cahn Scheme
NASA Astrophysics Data System (ADS)
Luo, Xiyang; Bertozzi, Andrea L.
2017-05-01
The graph Laplacian and the graph cut problem are closely related to Markov random fields, and have many applications in clustering and image segmentation. The diffuse interface model is widely used for modeling in material science, and can also be used as a proxy to total variation minimization. In Bertozzi and Flenner (Multiscale Model Simul 10(3):1090-1118, 2012), an algorithm was developed to generalize the diffuse interface model to graphs to solve the graph cut problem. This work analyzes the conditions for the graph diffuse interface algorithm to converge. Using techniques from numerical PDE and convex optimization, monotonicity in function value and convergence under an a posteriori condition are shown for a class of schemes under a graph-independent stepsize condition. We also generalize our results to incorporate spectral truncation, a common technique used to save computation cost, and also to the case of multiclass classification. Various numerical experiments are done to compare theoretical results with practical performance.
A graph decomposition-based approach for water distribution network optimization
NASA Astrophysics Data System (ADS)
Zheng, Feifei; Simpson, Angus R.; Zecchin, Aaron C.; Deuerlein, Jochen W.
2013-04-01
A novel optimization approach for water distribution network design is proposed in this paper. Using graph theory algorithms, a full water network is first decomposed into different subnetworks based on the connectivity of the network's components. The original whole network is simplified to a directed augmented tree, in which the subnetworks are substituted by augmented nodes and directed links are created to connect them. Differential evolution (DE) is then employed to optimize each subnetwork based on the sequence specified by the assigned directed links in the augmented tree. Rather than optimizing the original network as a whole, the subnetworks are sequentially optimized by the DE algorithm. A solution choice table is established for each subnetwork (except for the subnetwork that includes a supply node) and the optimal solution of the original whole network is finally obtained by use of the solution choice tables. Furthermore, a preconditioning algorithm is applied to the subnetworks to produce an approximately optimal solution for the original whole network. This solution specifies promising regions for the final optimization algorithm to further optimize the subnetworks. Five water network case studies are used to demonstrate the effectiveness of the proposed optimization method. A standard DE algorithm (SDE) and a genetic algorithm (GA) are applied to each case study without network decomposition to enable a comparison with the proposed method. The results show that the proposed method consistently outperforms the SDE and GA (both with tuned parameters) in terms of both the solution quality and efficiency.
Craig, Hugh; Berretta, Regina; Moscato, Pablo
2016-01-01
In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416
Contact Graph Routing Enhancements Developed in ION for DTN
NASA Technical Reports Server (NTRS)
Segui, John S.; Burleigh, Scott
2013-01-01
The Interplanetary Overlay Network (ION) software suite is an open-source, flight-ready implementation of networking protocols including the Delay/Disruption Tolerant Networking (DTN) Bundle Protocol (BP), the CCSDS (Consultative Committee for Space Data Systems) File Delivery Protocol (CFDP), and many others including the Contact Graph Routing (CGR) DTN routing system. While DTN offers the capability to tolerate disruption and long signal propagation delays in transmission, without an appropriate routing protocol, no data can be delivered. CGR was built for space exploration networks with scheduled communication opportunities (typically based on trajectories and orbits), represented as a contact graph. Since CGR uses knowledge of future connectivity, the contact graph can grow rather large, and so efficient processing is desired. These enhancements allow CGR to scale to predicted NASA space network complexities and beyond. This software improves upon CGR by adopting an earliest-arrival-time cost metric and using the Dijkstra path selection algorithm. Moving to Dijkstra path selection also enables construction of an earliest- arrival-time tree for multicast routing. The enhancements have been rolled into ION 3.0 available on sourceforge.net.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kou, Qiang; Wu, Si; Tolić, Nikola
Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a “bird’s eye view” of intact proteoforms. The combinatorial explosion of various alterations on a protein may result inmore » billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry data sets showed that TopMG outperformed existing methods in identifying complex proteoforms.« less
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)
DOE Office of Scientific and Technical Information (OSTI.GOV)
The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Lombaert, Herve; Grady, Leo; Polimeni, Jonathan R.; Cheriet, Farida
2013-01-01
Existing methods for surface matching are limited by the trade-off between precision and computational efficiency. Here we present an improved algorithm for dense vertex-to-vertex correspondence that uses direct matching of features defined on a surface and improves it by using spectral correspondence as a regularization. This algorithm has the speed of both feature matching and spectral matching while exhibiting greatly improved precision (distance errors of 1.4%). The method, FOCUSR, incorporates implicitly such additional features to calculate the correspondence and relies on the smoothness of the lowest-frequency harmonics of a graph Laplacian to spatially regularize the features. In its simplest form, FOCUSR is an improved spectral correspondence method that nonrigidly deforms spectral embeddings. We provide here a full realization of spectral correspondence where virtually any feature can be used as additional information using weights on graph edges, but also on graph nodes and as extra embedded coordinates. As an example, the full power of FOCUSR is demonstrated in a real case scenario with the challenging task of brain surface matching across several individuals. Our results show that combining features and regularizing them in a spectral embedding greatly improves the matching precision (to a sub-millimeter level) while performing at much greater speed than existing methods. PMID:23868776
Zhang, Feng; Liao, Xiangke; Peng, Shaoliang; Cui, Yingbo; Wang, Bingqiang; Zhu, Xiaoqian; Liu, Jie
2016-06-01
' The de novo assembly of DNA sequences is increasingly important for biological researches in the genomic era. After more than one decade since the Human Genome Project, some challenges still exist and new solutions are being explored to improve de novo assembly of genomes. String graph assembler (SGA), based on the string graph theory, is a new method/tool developed to address the challenges. In this paper, based on an in-depth analysis of SGA we prove that the SGA-based sequence de novo assembly is an NP-complete problem. According to our analysis, SGA outperforms other similar methods/tools in memory consumption, but costs much more time, of which 60-70 % is spent on the index construction. Upon this analysis, we introduce a hybrid parallel optimization algorithm and implement this algorithm in the TianHe-2's parallel framework. Simulations are performed with different datasets. For data of small size the optimized solution is 3.06 times faster than before, and for data of middle size it's 1.60 times. The results demonstrate an evident performance improvement, with the linear scalability for parallel FM-index construction. This results thus contribute significantly to improving the efficiency of de novo assembly of DNA sequences.
Man-Made Object Extraction from Remote Sensing Imagery by Graph-Based Manifold Ranking
NASA Astrophysics Data System (ADS)
He, Y.; Wang, X.; Hu, X. Y.; Liu, S. H.
2018-04-01
The automatic extraction of man-made objects from remote sensing imagery is useful in many applications. This paper proposes an algorithm for extracting man-made objects automatically by integrating a graph model with the manifold ranking algorithm. Initially, we estimate a priori value of the man-made objects with the use of symmetric and contrast features. The graph model is established to represent the spatial relationships among pre-segmented superpixels, which are used as the graph nodes. Multiple characteristics, namely colour, texture and main direction, are used to compute the weights of the adjacent nodes. Manifold ranking effectively explores the relationships among all the nodes in the feature space as well as initial query assignment; thus, it is applied to generate a ranking map, which indicates the scores of the man-made objects. The man-made objects are then segmented on the basis of the ranking map. Two typical segmentation algorithms are compared with the proposed algorithm. Experimental results show that the proposed algorithm can extract man-made objects with high recognition rate and low omission rate.
Distributed-Memory Breadth-First Search on Massive Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buluc, Aydin; Beamer, Scott; Madduri, Kamesh
This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.
Patterns and Practices for Future Architectures
2014-08-01
14. SUBJECT TERMS computing architecture, graph algorithms, high-performance computing, big data , GPU 15. NUMBER OF PAGES 44 16. PRICE CODE 17...at Vertex 1 6 Figure 4: Data Structures Created by Kernel 1 of Single CPU, List Implementation Using the Graph in the Example from Section 1.2 9...Figure 5: Kernel 2 of Graph500 BFS Reference Implementation: Single CPU, List 10 Figure 6: Data Structures for Sequential CSR Algorithm 12 Figure 7
Modeling flow and transport in fracture networks using graphs
NASA Astrophysics Data System (ADS)
Karra, S.; O'Malley, D.; Hyman, J. D.; Viswanathan, H. S.; Srinivasan, G.
2018-03-01
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizations of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. The good accuracy and the low computational cost, with O (104) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.
Modeling flow and transport in fracture networks using graphs.
Karra, S; O'Malley, D; Hyman, J D; Viswanathan, H S; Srinivasan, G
2018-03-01
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizations of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. The good accuracy and the low computational cost, with O(10^{4}) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.
Modeling flow and transport in fracture networks using graphs
Karra, S.; O'Malley, D.; Hyman, J. D.; ...
2018-03-09
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizationsmore » of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. In conclusion, the good accuracy and the low computational cost, with O(10 4) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.« less
Modeling flow and transport in fracture networks using graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karra, S.; O'Malley, D.; Hyman, J. D.
Fractures form the main pathways for flow in the subsurface within low-permeability rock. For this reason, accurately predicting flow and transport in fractured systems is vital for improving the performance of subsurface applications. Fracture sizes in these systems can range from millimeters to kilometers. Although modeling flow and transport using the discrete fracture network (DFN) approach is known to be more accurate due to incorporation of the detailed fracture network structure over continuum-based methods, capturing the flow and transport in such a wide range of scales is still computationally intractable. Furthermore, if one has to quantify uncertainty, hundreds of realizationsmore » of these DFN models have to be run. To reduce the computational burden, we solve flow and transport on a graph representation of a DFN. We study the accuracy of the graph approach by comparing breakthrough times and tracer particle statistical data between the graph-based and the high-fidelity DFN approaches, for fracture networks with varying number of fractures and degree of heterogeneity. Due to our recent developments in capabilities to perform DFN high-fidelity simulations on fracture networks with large number of fractures, we are in a unique position to perform such a comparison. We show that the graph approach shows a consistent bias with up to an order of magnitude slower breakthrough when compared to the DFN approach. We show that this is due to graph algorithm's underprediction of the pressure gradients across intersections on a given fracture, leading to slower tracer particle speeds between intersections and longer travel times. We present a bias correction methodology to the graph algorithm that reduces the discrepancy between the DFN and graph predictions. We show that with this bias correction, the graph algorithm predictions significantly improve and the results are very accurate. In conclusion, the good accuracy and the low computational cost, with O(10 4) times lower times than the DFN, makes the graph algorithm an ideal technique to incorporate in uncertainty quantification methods.« less
Object segmentation using graph cuts and active contours in a pyramidal framework
NASA Astrophysics Data System (ADS)
Subudhi, Priyambada; Mukhopadhyay, Susanta
2018-03-01
Graph cuts and active contours are two very popular interactive object segmentation techniques in the field of computer vision and image processing. However, both these approaches have their own well-known limitations. Graph cut methods perform efficiently giving global optimal segmentation result for smaller images. However, for larger images, huge graphs need to be constructed which not only takes an unacceptable amount of memory but also increases the time required for segmentation to a great extent. On the other hand, in case of active contours, initial contour selection plays an important role in the accuracy of the segmentation. So a proper selection of initial contour may improve the complexity as well as the accuracy of the result. In this paper, we have tried to combine these two approaches to overcome their above-mentioned drawbacks and develop a fast technique of object segmentation. Here, we have used a pyramidal framework and applied the mincut/maxflow algorithm on the lowest resolution image with the least number of seed points possible which will be very fast due to the smaller size of the image. Then, the obtained segmentation contour is super-sampled and and worked as the initial contour for the next higher resolution image. As the initial contour is very close to the actual contour, so fewer number of iterations will be required for the convergence of the contour. The process is repeated for all the high-resolution images and experimental results show that our approach is faster as well as memory efficient as compare to both graph cut or active contour segmentation alone.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wayne F. Boyer; Gurdeep S. Hura
2005-09-01
The Problem of obtaining an optimal matching and scheduling of interdependent tasks in distributed heterogeneous computing (DHC) environments is well known to be an NP-hard problem. In a DHC system, task execution time is dependent on the machine to which it is assigned and task precedence constraints are represented by a directed acyclic graph. Recent research in evolutionary techniques has shown that genetic algorithms usually obtain more efficient schedules that other known algorithms. We propose a non-evolutionary random scheduling (RS) algorithm for efficient matching and scheduling of inter-dependent tasks in a DHC system. RS is a succession of randomized taskmore » orderings and a heuristic mapping from task order to schedule. Randomized task ordering is effectively a topological sort where the outcome may be any possible task order for which the task precedent constraints are maintained. A detailed comparison to existing evolutionary techniques (GA and PSGA) shows the proposed algorithm is less complex than evolutionary techniques, computes schedules in less time, requires less memory and fewer tuning parameters. Simulation results show that the average schedules produced by RS are approximately as efficient as PSGA schedules for all cases studied and clearly more efficient than PSGA for certain cases. The standard formulation for the scheduling problem addressed in this paper is Rm|prec|Cmax.,« less
A SAT Based Effective Algorithm for the Directed Hamiltonian Cycle Problem
NASA Astrophysics Data System (ADS)
Jäger, Gerold; Zhang, Weixiong
The Hamiltonian cycle problem (HCP) is an important combinatorial problem with applications in many areas. While thorough theoretical and experimental analyses have been made on the HCP in undirected graphs, little is known for the HCP in directed graphs (DHCP). The contribution of this work is an effective algorithm for the DHCP. Our algorithm explores and exploits the close relationship between the DHCP and the Assignment Problem (AP) and utilizes a technique based on Boolean satisfiability (SAT). By combining effective algorithms for the AP and SAT, our algorithm significantly outperforms previous exact DHCP algorithms including an algorithm based on the award-winning Concorde TSP algorithm.
Graph run-length matrices for histopathological image segmentation.
Tosun, Akif Burak; Gunduz-Demir, Cigdem
2011-03-01
The histopathological examination of tissue specimens is essential for cancer diagnosis and grading. However, this examination is subject to a considerable amount of observer variability as it mainly relies on visual interpretation of pathologists. To alleviate this problem, it is very important to develop computational quantitative tools, for which image segmentation constitutes the core step. In this paper, we introduce an effective and robust algorithm for the segmentation of histopathological tissue images. This algorithm incorporates the background knowledge of the tissue organization into segmentation. For this purpose, it quantifies spatial relations of cytological tissue components by constructing a graph and uses this graph to define new texture features for image segmentation. This new texture definition makes use of the idea of gray-level run-length matrices. However, it considers the runs of cytological components on a graph to form a matrix, instead of considering the runs of pixel intensities. Working with colon tissue images, our experiments demonstrate that the texture features extracted from "graph run-length matrices" lead to high segmentation accuracies, also providing a reasonable number of segmented regions. Compared with four other segmentation algorithms, the results show that the proposed algorithm is more effective in histopathological image segmentation.
Minimum nonuniform graph partitioning with unrelated weights
NASA Astrophysics Data System (ADS)
Makarychev, K. S.; Makarychev, Yu S.
2017-12-01
We give a bi-criteria approximation algorithm for the Minimum Nonuniform Graph Partitioning problem, recently introduced by Krauthgamer, Naor, Schwartz and Talwar. In this problem, we are given a graph G=(V,E) and k numbers ρ_1,\\dots, ρ_k. The goal is to partition V into k disjoint sets (bins) P_1,\\dots, P_k satisfying \\vert P_i\\vert≤ ρi \\vert V\\vert for all i, so as to minimize the number of edges cut by the partition. Our bi-criteria algorithm gives an O(\\sqrt{log \\vert V\\vert log k}) approximation for the objective function in general graphs and an O(1) approximation in graphs excluding a fixed minor. The approximate solution satisfies the relaxed capacity constraints \\vert P_i\\vert ≤ (5+ \\varepsilon)ρi \\vert V\\vert. This algorithm is an improvement upon the O(log \\vert V\\vert)-approximation algorithm by Krauthgamer, Naor, Schwartz and Talwar. We extend our results to the case of 'unrelated weights' and to the case of 'unrelated d-dimensional weights'. A preliminary version of this work was presented at the 41st International Colloquium on Automata, Languages and Programming (ICALP 2014). Bibliography: 7 titles.
On Parallel Push-Relabel based Algorithms for Bipartite Maximum Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Langguth, Johannes; Azad, Md Ariful; Halappanavar, Mahantesh
2014-07-01
We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial (graph) problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing maximum transversal of a matrix. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a testset comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for themore » parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.« less
Edge Pushing is Equivalent to Vertex Elimination for Computing Hessians
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Mu; Pothen, Alex; Hovland, Paul
We prove the equivalence of two different Hessian evaluation algorithms in AD. The first is the Edge Pushing algorithm of Gower and Mello, which may be viewed as a second order Reverse mode algorithm for computing the Hessian. In earlier work, we have derived the Edge Pushing algorithm by exploiting a Reverse mode invariant based on the concept of live variables in compiler theory. The second algorithm is based on eliminating vertices in a computational graph of the gradient, in which intermediate variables are successively eliminated from the graph, and the weights of the edges are updated suitably. We provemore » that if the vertices are eliminated in a reverse topological order while preserving symmetry in the computational graph of the gradient, then the Vertex Elimination algorithm and the Edge Pushing algorithm perform identical computations. In this sense, the two algorithms are equivalent. This insight that unifies two seemingly disparate approaches to Hessian computations could lead to improved algorithms and implementations for computing Hessians. Read More: http://epubs.siam.org/doi/10.1137/1.9781611974690.ch11« less
NASA Astrophysics Data System (ADS)
Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo
2012-08-01
We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.
Multimodal Logistics Network Design over Planning Horizon through a Hybrid Meta-Heuristic Approach
NASA Astrophysics Data System (ADS)
Shimizu, Yoshiaki; Yamazaki, Yoshihiro; Wada, Takeshi
Logistics has been acknowledged increasingly as a key issue of supply chain management to improve business efficiency under global competition and diversified customer demands. This study aims at improving a quality of strategic decision making associated with dynamic natures in logistics network optimization. Especially, noticing an importance to concern with a multimodal logistics under multiterms, we have extended a previous approach termed hybrid tabu search (HybTS). The attempt intends to deploy a strategic planning more concretely so that the strategic plan can link to an operational decision making. The idea refers to a smart extension of the HybTS to solve a dynamic mixed integer programming problem. It is a two-level iterative method composed of a sophisticated tabu search for the location problem at the upper level and a graph algorithm for the route selection at the lower level. To keep efficiency while coping with the resulting extremely large-scale problem, we invented a systematic procedure to transform the original linear program at the lower-level into a minimum cost flow problem solvable by the graph algorithm. Through numerical experiments, we verified the proposed method outperformed the commercial software. The results indicate the proposed approach can make the conventional strategic decision much more practical and is promising for real world applications.
Graph Drawing Aesthetics-Created by Users, Not Algorithms.
Purchase, H C; Pilcher, C; Plimmer, B
2012-01-01
Prior empirical work on layout aesthetics for graph drawing algorithms has concentrated on the interpretation of existing graph drawings. We report on experiments which focus on the creation and layout of graph drawings: participants were asked to draw graphs based on adjacency lists, and to lay them out "nicely." Two interaction methods were used for creating the drawings: a sketch interface which allows for easy, natural hand movements, and a formal point-and-click interface similar to a typical graph editing system. We find, in common with many other studies, that removing edge crossings is the most significant aesthetic, but also discover that aligning nodes and edges to an underlying grid is important. We observe that the aesthetics favored by participants during creation of a graph drawing are often not evident in the final product and that the participants did not make a clear distinction between the processes of creation and layout. Our results suggest that graph drawing systems should integrate automatic layout with the user's manual editing process, and provide facilities to support grid-based graph creation.
Toward the optimization of normalized graph Laplacian.
Xie, Bo; Wang, Meng; Tao, Dacheng
2011-04-01
Normalized graph Laplacian has been widely used in many practical machine learning algorithms, e.g., spectral clustering and semisupervised learning. However, all of them use the Euclidean distance to construct the graph Laplacian, which does not necessarily reflect the inherent distribution of the data. In this brief, we propose a method to directly optimize the normalized graph Laplacian by using pairwise constraints. The learned graph is consistent with equivalence and nonequivalence pairwise relationships, and thus it can better represent similarity between samples. Meanwhile, our approach, unlike metric learning, automatically determines the scale factor during the optimization. The learned normalized Laplacian matrix can be directly applied in spectral clustering and semisupervised learning algorithms. Comprehensive experiments demonstrate the effectiveness of the proposed approach.
Robust Algorithms for on Minor-Free Graphs Based on the Sherali-Adams Hierarchy
NASA Astrophysics Data System (ADS)
Magen, Avner; Moharrami, Mohammad
This work provides a Linear Programming-based Polynomial Time Approximation Scheme (PTAS) for two classical NP-hard problems on graphs when the input graph is guaranteed to be planar, or more generally Minor Free. The algorithm applies a sufficiently large number (some function of when approximation is required) of rounds of the so-called Sherali-Adams Lift-and-Project system. needed to obtain a -approximation, where f is some function that depends only on the graph that should be avoided as a minor. The problem we discuss are the well-studied problems, the and problems. An curious fact we expose is that in the world of minor-free graph, the is harder in some sense than the.
Global Binary Optimization on Graphs for Classification of High Dimensional Data
2014-09-01
Buades et al . in [10] introduce a new non-local means algorithm for image denoising and compare it to some of the best methods. In [28], Grady de...scribes a random walk algorithm for image seg- mentation using the solution to a Dirichlet prob- lem. Elmoataz et al . present generalizations of the...graph Laplacian [19] for image denoising and man- ifold smoothing. Couprie et al . in [16] propose a parameterized graph-based energy function that unifies
A software tool for dataflow graph scheduling
NASA Technical Reports Server (NTRS)
Jones, Robert L., III
1994-01-01
A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on multiple processors. The dataflow paradigm is very useful in exposing the parallelism inherent in algorithms. It provides a graphical and mathematical model which describes a partial ordering of algorithm tasks based on data precedence.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coram, Jamie L.; Morrow, James D.; Perkins, David Nikolaus
2015-09-01
This document describes the PANTHER R&D Application, a proof-of-concept user interface application developed under the PANTHER Grand Challenge LDRD. The purpose of the application is to explore interaction models for graph analytics, drive algorithmic improvements from an end-user point of view, and support demonstration of PANTHER technologies to potential customers. The R&D Application implements a graph-centric interaction model that exposes analysts to the algorithms contained within the GeoGraphy graph analytics library. Users define geospatial-temporal semantic graph queries by constructing search templates based on nodes, edges, and the constraints among them. Users then analyze the results of the queries using bothmore » geo-spatial and temporal visualizations. Development of this application has made user experience an explicit driver for project and algorithmic level decisions that will affect how analysts one day make use of PANTHER technologies.« less
NASA Technical Reports Server (NTRS)
Mielke, Roland V. (Inventor); Stoughton, John W. (Inventor)
1990-01-01
Computationally complex primitive operations of an algorithm are executed concurrently in a plurality of functional units under the control of an assignment manager. The algorithm is preferably defined as a computationally marked graph contianing data status edges (paths) corresponding to each of the data flow edges. The assignment manager assigns primitive operations to the functional units and monitors completion of the primitive operations to determine data availability using the computational marked graph of the algorithm. All data accessing of the primitive operations is performed by the functional units independently of the assignment manager.
An Improved Heuristic Method for Subgraph Isomorphism Problem
NASA Astrophysics Data System (ADS)
Xiang, Yingzhuo; Han, Jiesi; Xu, Haijiang; Guo, Xin
2017-09-01
This paper focus on the subgraph isomorphism (SI) problem. We present an improved genetic algorithm, a heuristic method to search the optimal solution. The contribution of this paper is that we design a dedicated crossover algorithm and a new fitness function to measure the evolution process. Experiments show our improved genetic algorithm performs better than other heuristic methods. For a large graph, such as a subgraph of 40 nodes, our algorithm outperforms the traditional tree search algorithms. We find that the performance of our improved genetic algorithm does not decrease as the number of nodes in prototype graphs.
New Graph Models and Algorithms for Detecting Salient Structures from Cluttered Images
2010-02-24
Development of graph models and algorithms to detect boundaries that show certain levels of symmetry, an important geometric property of many...Bookstein. Morphometric tools for landmark data. Cambridge University Press, 1991. [8] F. L. Bookstein. Principal warps: Thin-plate splines and the
Irreversibility of financial time series: A graph-theoretical approach
NASA Astrophysics Data System (ADS)
Flanagan, Ryan; Lacasa, Lucas
2016-04-01
The relation between time series irreversibility and entropy production has been recently investigated in thermodynamic systems operating away from equilibrium. In this work we explore this concept in the context of financial time series. We make use of visibility algorithms to quantify, in graph-theoretical terms, time irreversibility of 35 financial indices evolving over the period 1998-2012. We show that this metric is complementary to standard measures based on volatility and exploit it to both classify periods of financial stress and to rank companies accordingly. We then validate this approach by finding that a projection in principal components space of financial years, based on time irreversibility features, clusters together periods of financial stress from stable periods. Relations between irreversibility, efficiency and predictability are briefly discussed.
2009-01-01
Background Marginal posterior genotype probabilities need to be computed for genetic analyses such as geneticcounseling in humans and selective breeding in animal and plant species. Methods In this paper, we describe a peeling based, deterministic, exact algorithm to compute efficiently genotype probabilities for every member of a pedigree with loops without recourse to junction-tree methods from graph theory. The efficiency in computing the likelihood by peeling comes from storing intermediate results in multidimensional tables called cutsets. Computing marginal genotype probabilities for individual i requires recomputing the likelihood for each of the possible genotypes of individual i. This can be done efficiently by storing intermediate results in two types of cutsets called anterior and posterior cutsets and reusing these intermediate results to compute the likelihood. Examples A small example is used to illustrate the theoretical concepts discussed in this paper, and marginal genotype probabilities are computed at a monogenic disease locus for every member in a real cattle pedigree. PMID:19958551
Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms
NASA Astrophysics Data System (ADS)
Berkson, Emily E.; Messinger, David W.
2016-05-01
Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.
Multiple graph regularized protein domain ranking.
Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin
2012-11-19
Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
Multiple graph regularized protein domain ranking
2012-01-01
Background Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. Results To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. Conclusion The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. PMID:23157331
Unsupervised Metric Fusion Over Multiview Data by Graph Random Walk-Based Cross-View Diffusion.
Wang, Yang; Zhang, Wenjie; Wu, Lin; Lin, Xuemin; Zhao, Xiang
2017-01-01
Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.
Granular Flow Graph, Adaptive Rule Generation and Tracking.
Pal, Sankar Kumar; Chakraborty, Debarati Bhunia
2017-12-01
A new method of adaptive rule generation in granular computing framework is described based on rough rule base and granular flow graph, and applied for video tracking. In the process, several new concepts and operations are introduced, and methodologies formulated with superior performance. The flow graph enables in defining an intelligent technique for rule base adaptation where its characteristics in mapping the relevance of attributes and rules in decision-making system are exploited. Two new features, namely, expected flow graph and mutual dependency between flow graphs are defined to make the flow graph applicable in the tasks of both training and validation. All these techniques are performed in neighborhood granular level. A way of forming spatio-temporal 3-D granules of arbitrary shape and size is introduced. The rough flow graph-based adaptive granular rule-based system, thus produced for unsupervised video tracking, is capable of handling the uncertainties and incompleteness in frames, able to overcome the incompleteness in information that arises without initial manual interactions and in providing superior performance and gaining in computation time. The cases of partial overlapping and detecting the unpredictable changes are handled efficiently. It is shown that the neighborhood granulation provides a balanced tradeoff between speed and accuracy as compared to pixel level computation. The quantitative indices used for evaluating the performance of tracking do not require any information on ground truth as in the other methods. Superiority of the algorithm to nonadaptive and other recent ones is demonstrated extensively.
Coverability graphs for a class of synchronously executed unbounded Petri net
NASA Technical Reports Server (NTRS)
Stotts, P. David; Pratt, Terrence W.
1990-01-01
After detailing a variant of the concurrent-execution rule for firing of maximal subsets, in which the simultaneous firing of conflicting transitions is prohibited, an algorithm is constructed for generating the coverability graph of a net executed under this synchronous firing rule. The omega insertion criteria in the algorithm are shown to be valid for any net on which the algorithm terminates. It is accordingly shown that the set of nets on which the algorithm terminates includes the 'conflict-free' class.
CQPSO scheduling algorithm for heterogeneous multi-core DAG task model
NASA Astrophysics Data System (ADS)
Zhai, Wenzheng; Hu, Yue-Li; Ran, Feng
2017-07-01
Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
Quantum speedup in solving the maximal-clique problem
NASA Astrophysics Data System (ADS)
Chang, Weng-Long; Yu, Qi; Li, Zhaokai; Chen, Jiahui; Peng, Xinhua; Feng, Mang
2018-03-01
The maximal-clique problem, to find the maximally sized clique in a given graph, is classically an NP-complete computational problem, which has potential applications ranging from electrical engineering, computational chemistry, and bioinformatics to social networks. Here we develop a quantum algorithm to solve the maximal-clique problem for any graph G with n vertices with quadratic speedup over its classical counterparts, where the time and spatial complexities are reduced to, respectively, O (√{2n}) and O (n2) . With respect to oracle-related quantum algorithms for the NP-complete problems, we identify our algorithm as optimal. To justify the feasibility of the proposed quantum algorithm, we successfully solve a typical clique problem for a graph G with two vertices and one edge by carrying out a nuclear magnetic resonance experiment involving four qubits.
Fitchi: haplotype genealogy graphs based on the Fitch algorithm.
Matschiner, Michael
2016-04-15
: In population genetics and phylogeography, haplotype genealogy graphs are important tools for the visualization of population structure based on sequence data. In this type of graph, node sizes are often drawn in proportion to haplotype frequencies and edge lengths represent the minimum number of mutations separating adjacent nodes. I here present Fitchi, a new program that produces publication-ready haplotype genealogy graphs based on the Fitch algorithm. http://www.evoinformatics.eu/fitchi.htm : michaelmatschiner@mac.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Hu, Weiming; Gao, Jin; Xing, Junliang; Zhang, Chao; Maybank, Stephen
2017-01-01
An appearance model adaptable to changes in object appearance is critical in visual object tracking. In this paper, we treat an image patch as a two-order tensor which preserves the original image structure. We design two graphs for characterizing the intrinsic local geometrical structure of the tensor samples of the object and the background. Graph embedding is used to reduce the dimensions of the tensors while preserving the structure of the graphs. Then, a discriminant embedding space is constructed. We prove two propositions for finding the transformation matrices which are used to map the original tensor samples to the tensor-based graph embedding space. In order to encode more discriminant information in the embedding space, we propose a transfer-learning- based semi-supervised strategy to iteratively adjust the embedding space into which discriminative information obtained from earlier times is transferred. We apply the proposed semi-supervised tensor-based graph embedding learning algorithm to visual tracking. The new tracking algorithm captures an object's appearance characteristics during tracking and uses a particle filter to estimate the optimal object state. Experimental results on the CVPR 2013 benchmark dataset demonstrate the effectiveness of the proposed tracking algorithm.
Overview and extensions of a system for routing directed graphs on SIMD architectures
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl
1988-01-01
Many problems can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from adjacent vertices. A method is given for parallelizing such problems on an SIMD machine model that uses only nearest neighbor connections for communication, and has no facility for local indirect addressing. Each vertex of the graph will be assigned to a processor in the machine. Rules for a labeling are introduced that support the use of a simple algorithm for movement of data along the edges of the graph. Additional algorithms are defined for addition and deletion of edges. Modifying or adding a new edge takes the same time as parallel traversal. This combination of architecture and algorithms defines a system that is relatively simple to build and can do fast graph processing. All edges can be traversed in parallel in time O(T), where T is empirically proportional to the average path length in the embedding times the average degree of the graph. Additionally, researchers present an extension to the above method which allows for enhanced performance by allowing some broadcasting capabilities.
Adaptive random walks on the class of Web graphs
NASA Astrophysics Data System (ADS)
Tadić, B.
2001-09-01
We study random walk with adaptive move strategies on a class of directed graphs with variable wiring diagram. The graphs are grown from the evolution rules compatible with the dynamics of the world-wide Web [B. Tadić, Physica A 293, 273 (2001)], and are characterized by a pair of power-law distributions of out- and in-degree for each value of the parameter β, which measures the degree of rewiring in the graph. The walker adapts its move strategy according to locally available information both on out-degree of the visited node and in-degree of target node. A standard random walk, on the other hand, uses the out-degree only. We compute the distribution of connected subgraphs visited by an ensemble of walkers, the average access time and survival probability of the walks. We discuss these properties of the walk dynamics relative to the changes in the global graph structure when the control parameter β is varied. For β≥ 3, corresponding to the world-wide Web, the access time of the walk to a given level of hierarchy on the graph is much shorter compared to the standard random walk on the same graph. By reducing the amount of rewiring towards rigidity limit β↦βc≲ 0.1, corresponding to the range of naturally occurring biochemical networks, the survival probability of adaptive and standard random walk become increasingly similar. The adaptive random walk can be used as an efficient message-passing algorithm on this class of graphs for large degree of rewiring.
Non-Convex Sparse and Low-Rank Based Robust Subspace Segmentation for Data Mining.
Cheng, Wenlong; Zhao, Mingbo; Xiong, Naixue; Chui, Kwok Tai
2017-07-15
Parsimony, including sparsity and low-rank, has shown great importance for data mining in social networks, particularly in tasks such as segmentation and recognition. Traditionally, such modeling approaches rely on an iterative algorithm that minimizes an objective function with convex l ₁-norm or nuclear norm constraints. However, the obtained results by convex optimization are usually suboptimal to solutions of original sparse or low-rank problems. In this paper, a novel robust subspace segmentation algorithm has been proposed by integrating l p -norm and Schatten p -norm constraints. Our so-obtained affinity graph can better capture local geometrical structure and the global information of the data. As a consequence, our algorithm is more generative, discriminative and robust. An efficient linearized alternating direction method is derived to realize our model. Extensive segmentation experiments are conducted on public datasets. The proposed algorithm is revealed to be more effective and robust compared to five existing algorithms.
A New Augmentation Based Algorithm for Extracting Maximal Chordal Subgraphs.
Bhowmick, Sanjukta; Chen, Tzu-Yi; Halappanavar, Mahantesh
2015-02-01
A graph is chordal if every cycle of length greater than three contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and practically, since they arise in many applications including sparse linear algebra, computer vision, and computational biology. A maximal chordal subgraph is a chordal subgraph that is not a proper subgraph of any other chordal subgraph. Existing algorithms for computing maximal chordal subgraphs depend on dynamically ordering the vertices, which is an inherently sequential process and therefore limits the algorithms' parallelizability. In this paper we explore techniques to develop a scalable parallel algorithm for extracting a maximal chordal subgraph. We demonstrate that an earlier attempt at developing a parallel algorithm may induce a non-optimal vertex ordering and is therefore not guaranteed to terminate with a maximal chordal subgraph. We then give a new algorithm that first computes and then repeatedly augments a spanning chordal subgraph. After proving that the algorithm terminates with a maximal chordal subgraph, we then demonstrate that this algorithm is more amenable to parallelization and that the parallel version also terminates with a maximal chordal subgraph. That said, the complexity of the new algorithm is higher than that of the previous parallel algorithm, although the earlier algorithm computes a chordal subgraph which is not guaranteed to be maximal. We experimented with our augmentation-based algorithm on both synthetic and real-world graphs. We provide scalability results and also explore the effect of different choices for the initial spanning chordal subgraph on both the running time and on the number of edges in the maximal chordal subgraph.
A local search for a graph clustering problem
NASA Astrophysics Data System (ADS)
Navrotskaya, Anna; Il'ev, Victor
2016-10-01
In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
A system for routing arbitrary directed graphs on SIMD architectures
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl
1987-01-01
There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal.
Efficient sequential and parallel algorithms for record linkage.
Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar
2014-01-01
Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
Huang, Xiaoke; Zhao, Ye; Yang, Jing; Zhang, Chong; Ma, Chao; Ye, Xinyue
2016-01-01
We propose TrajGraph, a new visual analytics method, for studying urban mobility patterns by integrating graph modeling and visual analysis with taxi trajectory data. A special graph is created to store and manifest real traffic information recorded by taxi trajectories over city streets. It conveys urban transportation dynamics which can be discovered by applying graph analysis algorithms. To support interactive, multiscale visual analytics, a graph partitioning algorithm is applied to create region-level graphs which have smaller size than the original street-level graph. Graph centralities, including Pagerank and betweenness, are computed to characterize the time-varying importance of different urban regions. The centralities are visualized by three coordinated views including a node-link graph view, a map view and a temporal information view. Users can interactively examine the importance of streets to discover and assess city traffic patterns. We have implemented a fully working prototype of this approach and evaluated it using massive taxi trajectories of Shenzhen, China. TrajGraph's capability in revealing the importance of city streets was evaluated by comparing the calculated centralities with the subjective evaluations from a group of drivers in Shenzhen. Feedback from a domain expert was collected. The effectiveness of the visual interface was evaluated through a formal user study. We also present several examples and a case study to demonstrate the usefulness of TrajGraph in urban transportation analysis.
Development of antibiotic regimens using graph based evolutionary algorithms.
Corns, Steven M; Ashlock, Daniel A; Bryden, Kenneth M
2013-12-01
This paper examines the use of evolutionary algorithms in the development of antibiotic regimens given to production animals. A model is constructed that combines the lifespan of the animal and the bacteria living in the animal's gastro-intestinal tract from the early finishing stage until the animal reaches market weight. This model is used as the fitness evaluation for a set of graph based evolutionary algorithms to assess the impact of diversity control on the evolving antibiotic regimens. The graph based evolutionary algorithms have two objectives: to find an antibiotic treatment regimen that maintains the weight gain and health benefits of antibiotic use and to reduce the risk of spreading antibiotic resistant bacteria. This study examines different regimens of tylosin phosphate use on bacteria populations divided into Gram positive and Gram negative types, with a focus on Campylobacter spp. Treatment regimens were found that provided decreased antibiotic resistance relative to conventional methods while providing nearly the same benefits as conventional antibiotic regimes. By using a graph to control the information flow in the evolutionary algorithm, a variety of solutions along the Pareto front can be found automatically for this and other multi-objective problems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...
2016-09-21
In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks
Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Watanabe, Takanori; Kessler, Daniel; Scott, Clayton; Angstadt, Michael; Sripada, Chandra
2014-01-01
Substantial evidence indicates that major psychiatric disorders are associated with distributed neural dysconnectivity, leading to strong interest in using neuroimaging methods to accurately predict disorder status. In this work, we are specifically interested in a multivariate approach that uses features derived from whole-brain resting state functional connectomes. However, functional connectomes reside in a high dimensional space, which complicates model interpretation and introduces numerous statistical and computational challenges. Traditional feature selection techniques are used to reduce data dimensionality, but are blind to the spatial structure of the connectomes. We propose a regularization framework where the 6-D structure of the functional connectome (defined by pairs of points in 3-D space) is explicitly taken into account via the fused Lasso or the GraphNet regularizer. Our method only restricts the loss function to be convex and margin-based, allowing non-differentiable loss functions such as the hinge-loss to be used. Using the fused Lasso or GraphNet regularizer with the hinge-loss leads to a structured sparse support vector machine (SVM) with embedded feature selection. We introduce a novel efficient optimization algorithm based on the augmented Lagrangian and the classical alternating direction method, which can solve both fused Lasso and GraphNet regularized SVM with very little modification. We also demonstrate that the inner subproblems of the algorithm can be solved efficiently in analytic form by coupling the variable splitting strategy with a data augmentation scheme. Experiments on simulated data and resting state scans from a large schizophrenia dataset show that our proposed approach can identify predictive regions that are spatially contiguous in the 6-D “connectome space,” offering an additional layer of interpretability that could provide new insights about various disease processes. PMID:24704268
NASA Astrophysics Data System (ADS)
Albirri, E. R.; Sugeng, K. A.; Aldila, D.
2018-04-01
Nowadays, in the modern world, since technology and human civilization start to progress, all city in the world is almost connected. The various places in this world are easier to visit. It is an impact of transportation technology and highway construction. The cities which have been connected can be represented by graph. Graph clustering is one of ways which is used to answer some problems represented by graph. There are some methods in graph clustering to solve the problem spesifically. One of them is Highly Connected Subgraphs (HCS) method. HCS is used to identify cluster based on the graph connectivity k for graph G. The connectivity in graph G is denoted by k(G)> \\frac{n}{2} that n is the total of vertices in G, then it is called as HCS or the cluster. This research used literature review and completed with simulation of program in a software. We modified HCS algorithm by using weighted graph. The modification is located in the Process Phase. Process Phase is used to cut the connected graph G into two subgraphs H and \\bar{H}. We also made a program by using software Octave-401. Then we applied the data of Flight Routes Mapping of One of Airlines in Indonesia to our program.
Minimal-memory realization of pearl-necklace encoders of general quantum convolutional codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Houshmand, Monireh; Hosseini-Khayat, Saied
2011-02-15
Quantum convolutional codes, like their classical counterparts, promise to offer higher error correction performance than block codes of equivalent encoding complexity, and are expected to find important applications in reliable quantum communication where a continuous stream of qubits is transmitted. Grassl and Roetteler devised an algorithm to encode a quantum convolutional code with a ''pearl-necklace'' encoder. Despite their algorithm's theoretical significance as a neat way of representing quantum convolutional codes, it is not well suited to practical realization. In fact, there is no straightforward way to implement any given pearl-necklace structure. This paper closes the gap between theoretical representation andmore » practical implementation. In our previous work, we presented an efficient algorithm to find a minimal-memory realization of a pearl-necklace encoder for Calderbank-Shor-Steane (CSS) convolutional codes. This work is an extension of our previous work and presents an algorithm for turning a pearl-necklace encoder for a general (non-CSS) quantum convolutional code into a realizable quantum convolutional encoder. We show that a minimal-memory realization depends on the commutativity relations between the gate strings in the pearl-necklace encoder. We find a realization by means of a weighted graph which details the noncommutative paths through the pearl necklace. The weight of the longest path in this graph is equal to the minimal amount of memory needed to implement the encoder. The algorithm has a polynomial-time complexity in the number of gate strings in the pearl-necklace encoder.« less
Danaci, Hasan Fehmi; Cetin-Atalay, Rengul; Atalay, Volkan
2018-03-26
Visualizing large-scale data produced by the high throughput experiments as a biological graph leads to better understanding and analysis. This study describes a customized force-directed layout algorithm, EClerize, for biological graphs that represent pathways in which the nodes are associated with Enzyme Commission (EC) attributes. The nodes with the same EC class numbers are treated as members of the same cluster. Positions of nodes are then determined based on both the biological similarity and the connection structure. EClerize minimizes the intra-cluster distance, that is the distance between the nodes of the same EC cluster and maximizes the inter-cluster distance, that is the distance between two distinct EC clusters. EClerize is tested on a number of biological pathways and the improvement brought in is presented with respect to the original algorithm. EClerize is available as a plug-in to cytoscape ( http://apps.cytoscape.org/apps/eclerize ).
Enhancing Community Detection By Affinity-based Edge Weighting Scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Andy; Sanders, Geoffrey; Henson, Van
Community detection refers to an important graph analytics problem of finding a set of densely-connected subgraphs in a graph and has gained a great deal of interest recently. The performance of current community detection algorithms is limited by an inherent constraint of unweighted graphs that offer very little information on their internal community structures. In this paper, we propose a new scheme to address this issue that weights the edges in a given graph based on recently proposed vertex affinity. The vertex affinity quantifies the proximity between two vertices in terms of their clustering strength, and therefore, it is idealmore » for graph analytics applications such as community detection. We also demonstrate that the affinity-based edge weighting scheme can improve the performance of community detection algorithms significantly.« less
Patch-based iterative conditional geostatistical simulation using graph cuts
NASA Astrophysics Data System (ADS)
Li, Xue; Mariethoz, Gregoire; Lu, DeTang; Linde, Niklas
2016-08-01
Training image-based geostatistical methods are increasingly popular in groundwater hydrology even if existing algorithms present limitations that often make real-world applications difficult. These limitations include a computational cost that can be prohibitive for high-resolution 3-D applications, the presence of visual artifacts in the model realizations, and a low variability between model realizations due to the limited pool of patterns available in a finite-size training image. In this paper, we address these issues by proposing an iterative patch-based algorithm which adapts a graph cuts methodology that is widely used in computer graphics. Our adapted graph cuts method optimally cuts patches of pixel values borrowed from the training image and assembles them successively, each time accounting for the information of previously stitched patches. The initial simulation result might display artifacts, which are identified as regions of high cost. These artifacts are reduced by iteratively placing new patches in high-cost regions. In contrast to most patch-based algorithms, the proposed scheme can also efficiently address point conditioning. An advantage of the method is that the cut process results in the creation of new patterns that are not present in the training image, thereby increasing pattern variability. To quantify this effect, a new measure of variability is developed, the merging index, quantifies the pattern variability in the realizations with respect to the training image. A series of sensitivity analyses demonstrates the stability of the proposed graph cuts approach, which produces satisfying simulations for a wide range of parameters values. Applications to 2-D and 3-D cases are compared to state-of-the-art multiple-point methods. The results show that the proposed approach obtains significant speedups and increases variability between realizations. Connectivity functions applied to 2-D models transport simulations in 3-D models are used to demonstrate that pattern continuity is preserved.
Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.
Maximova, Tatiana; Plaku, Erion; Shehu, Amarda
2016-07-07
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
Efficient Generation of Dancing Animation Synchronizing with Music Based on Meta Motion Graphs
NASA Astrophysics Data System (ADS)
Xu, Jianfeng; Takagi, Koichi; Sakazawa, Shigeyuki
This paper presents a system for automatic generation of dancing animation that is synchronized with a piece of music by re-using motion capture data. Basically, the dancing motion is synthesized according to the rhythm and intensity features of music. For this purpose, we propose a novel meta motion graph structure to embed the necessary features including both rhythm and intensity, which is constructed on the motion capture database beforehand. In this paper, we consider two scenarios for non-streaming music and streaming music, where global search and local search are required respectively. In the case of the former, once a piece of music is input, the efficient dynamic programming algorithm can be employed to globally search a best path in the meta motion graph, where an objective function is properly designed by measuring the quality of beat synchronization, intensity matching, and motion smoothness. In the case of the latter, the input music is stored in a buffer in a streaming mode, then an efficient search method is presented for a certain amount of music data (called a segment) in the buffer with the same objective function, resulting in a segment-based search approach. For streaming applications, we define an additional property in the above meta motion graph to deal with the unpredictable future music, which guarantees that there is some motion to match the unknown remaining music. A user study with totally 60 subjects demonstrates that our system outperforms the stat-of-the-art techniques in both scenarios. Furthermore, our system improves the synthesis speed greatly (maximal speedup is more than 500 times), which is essential for mobile applications. We have implemented our system on commercially available smart phones and confirmed that it works well on these mobile phones.
Distributed Sensing and Processing: A Graphical Model Approach
2005-11-30
that Ramanujan graph toplogies maximize the convergence rate of distributed detection consensus algorithms, improving over three orders of...small world type network designs. 14. SUBJECT TERMS Ramanujan graphs, sensor network topology, sensor network...that Ramanujan graphs, for which there are explicit algebraic constructions, have large eigenratios, converging much faster than structured graphs
Scarselli, Franco; Tsoi, Ah Chung; Hagenbuchner, Markus; Noi, Lucia Di
2013-12-01
This paper proposes the combination of two state-of-the-art algorithms for processing graph input data, viz., the probabilistic mapping graph self organizing map, an unsupervised learning approach, and the graph neural network, a supervised learning approach. We organize these two algorithms in a cascade architecture containing a probabilistic mapping graph self organizing map, and a graph neural network. We show that this combined approach helps us to limit the long-term dependency problem that exists when training the graph neural network resulting in an overall improvement in performance. This is demonstrated in an application to a benchmark problem requiring the detection of spam in a relatively large set of web sites. It is found that the proposed method produces results which reach the state of the art when compared with some of the best results obtained by others using quite different approaches. A particular strength of our method is its applicability towards any input domain which can be represented as a graph. Copyright © 2013 Elsevier Ltd. All rights reserved.
A path following algorithm for the graph matching problem.
Zaslavskiy, Mikhail; Bach, Francis; Vert, Jean-Philippe
2009-12-01
We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The concave relaxation has the same global minimum as the initial graph matching problem, but the search for its global minimum is also a hard combinatorial problem. We, therefore, construct an approximation of the concave problem solution by following a solution path of a convex-concave problem obtained by linear interpolation of the convex and concave formulations, starting from the convex relaxation. This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching methods on four data sets: simulated graphs, QAPLib, retina vessel images, and handwritten Chinese characters. In all cases, the results are competitive with the state of the art.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
B. Hendrickson; T.G. Kolda
1998-09-01
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
X-Graphs: Language and Algorithms for Heterogeneous Graph Streams
2017-09-01
INTRODUCTION 1 3 METHODS , ASUMPTIONS, AND PROCEDURES 2 Software Abstractions for Graph Analytic Applications 2 High performance Platforms for Graph Processing...data is stored in a distributed file system. 3 METHODS , ASUMPTIONS, AND PROCEDURES Software Abstractions for Graph Analytic Applications To...implementations of novel methods for networks analysis: several methods for detection of overlapping communities, personalized PageRank, node embeddings into a d
Time reversibility from visibility graphs of nonstationary processes
NASA Astrophysics Data System (ADS)
Lacasa, Lucas; Flanagan, Ryan
2015-08-01
Visibility algorithms are a family of methods to map time series into networks, with the aim of describing the structure of time series and their underlying dynamical properties in graph-theoretical terms. Here we explore some properties of both natural and horizontal visibility graphs associated to several nonstationary processes, and we pay particular attention to their capacity to assess time irreversibility. Nonstationary signals are (infinitely) irreversible by definition (independently of whether the process is Markovian or producing entropy at a positive rate), and thus the link between entropy production and time series irreversibility has only been explored in nonequilibrium stationary states. Here we show that the visibility formalism naturally induces a new working definition of time irreversibility, which allows us to quantify several degrees of irreversibility for stationary and nonstationary series, yielding finite values that can be used to efficiently assess the presence of memory and off-equilibrium dynamics in nonstationary processes without the need to differentiate or detrend them. We provide rigorous results complemented by extensive numerical simulations on several classes of stochastic processes.
Auction dynamics: A volume constrained MBO scheme
NASA Astrophysics Data System (ADS)
Jacobs, Matt; Merkurjev, Ekaterina; Esedoǧlu, Selim
2018-02-01
We show how auction algorithms, originally developed for the assignment problem, can be utilized in Merriman, Bence, and Osher's threshold dynamics scheme to simulate multi-phase motion by mean curvature in the presence of equality and inequality volume constraints on the individual phases. The resulting algorithms are highly efficient and robust, and can be used in simulations ranging from minimal partition problems in Euclidean space to semi-supervised machine learning via clustering on graphs. In the case of the latter application, numerous experimental results on benchmark machine learning datasets show that our approach exceeds the performance of current state-of-the-art methods, while requiring a fraction of the computation time.
Resource utilization model for the algorithm to architecture mapping model
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Patel, Rakesh R.
1993-01-01
The analytical model for resource utilization and the variable node time and conditional node model for the enhanced ATAMM model for a real-time data flow architecture are presented in this research. The Algorithm To Architecture Mapping Model, ATAMM, is a Petri net based graph theoretic model developed at Old Dominion University, and is capable of modeling the execution of large-grained algorithms on a real-time data flow architecture. Using the resource utilization model, the resource envelope may be obtained directly from a given graph and, consequently, the maximum number of required resources may be evaluated. The node timing diagram for one iteration period may be obtained using the analytical resource envelope. The variable node time model, which describes the change in resource requirement for the execution of an algorithm under node time variation, is useful to expand the applicability of the ATAMM model to heterogeneous architectures. The model also describes a method of detecting the presence of resource limited mode and its subsequent prevention. Graphs with conditional nodes are shown to be reduced to equivalent graphs with time varying nodes and, subsequently, may be analyzed using the variable node time model to determine resource requirements. Case studies are performed on three graphs for the illustration of applicability of the analytical theories.
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
NASA Astrophysics Data System (ADS)
Cruz-Roa, Angel; Xu, Jun; Madabhushi, Anant
2015-01-01
Nuclear architecture or the spatial arrangement of individual cancer nuclei on histopathology images has been shown to be associated with different grades and differential risk for a number of solid tumors such as breast, prostate, and oropharyngeal. Graph-based representations of individual nuclei (nuclei representing the graph nodes) allows for mining of quantitative metrics to describe tumor morphology. These graph features can be broadly categorized into global and local depending on the type of graph construction method. While a number of local graph (e.g. Cell Cluster Graphs) and global graph (e.g. Voronoi, Delaunay Triangulation, Minimum Spanning Tree) features have been shown to associated with cancer grade, risk, and outcome for different cancer types, the sensitivity of the preceding segmentation algorithms in identifying individual nuclei can have a significant bearing on the discriminability of the resultant features. This therefore begs the question as to which features while being discriminative of cancer grade and aggressiveness are also the most resilient to the segmentation errors. These properties are particularly desirable in the context of digital pathology images, where the method of slide preparation, staining, and type of nuclear segmentation algorithm employed can all dramatically affect the quality of the nuclear graphs and corresponding features. In this paper we evaluated the trade off between discriminability and stability of both global and local graph-based features in conjunction with a few different segmentation algorithms and in the context of two different histopathology image datasets of breast cancer from whole-slide images (WSI) and tissue microarrays (TMA). Specifically in this paper we investigate a few different performance measures including stability, discriminability and stability vs discriminability trade off, all of which are based on p-values from the Kruskal-Wallis one-way analysis of variance for local and global graph features. Apart from identifying the set of local and global features that satisfied the trade off between stability and discriminability, our most interesting finding was that a simple segmentation method was sufficient to identify the most discriminant features for invasive tumour detection in TMAs, whereas for tumour grading in WSI, the graph based features were more sensitive to the accuracy of the segmentation algorithm employed.
A new augmentation based algorithm for extracting maximal chordal subgraphs
Bhowmick, Sanjukta; Chen, Tzu-Yi; Halappanavar, Mahantesh
2014-10-18
If every cycle of a graph is chordal length greater than three then it contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and practically, since they arise in many applications including sparse linear algebra, computer vision, and computational biology. A maximal chordal subgraph is a chordal subgraph that is not a proper subgraph of any other chordal subgraph. Existing algorithms for computing maximal chordal subgraphs depend on dynamically ordering the vertices, which is an inherently sequential process and therefore limits the algorithms’more » parallelizability. In our paper we explore techniques to develop a scalable parallel algorithm for extracting a maximal chordal subgraph. We demonstrate that an earlier attempt at developing a parallel algorithm may induce a non-optimal vertex ordering and is therefore not guaranteed to terminate with a maximal chordal subgraph. We then give a new algorithm that first computes and then repeatedly augments a spanning chordal subgraph. After proving that the algorithm terminates with a maximal chordal subgraph, we then demonstrate that this algorithm is more amenable to parallelization and that the parallel version also terminates with a maximal chordal subgraph. That said, the complexity of the new algorithm is higher than that of the previous parallel algorithm, although the earlier algorithm computes a chordal subgraph which is not guaranteed to be maximal. Finally, we experimented with our augmentation-based algorithm on both synthetic and real-world graphs. We provide scalability results and also explore the effect of different choices for the initial spanning chordal subgraph on both the running time and on the number of edges in the maximal chordal subgraph.« less
Dynamic multicast routing scheme in WDM optical network
NASA Astrophysics Data System (ADS)
Zhu, Yonghua; Dong, Zhiling; Yao, Hong; Yang, Jianyong; Liu, Yibin
2007-11-01
During the information era, the Internet and the service of World Wide Web develop rapidly. Therefore, the wider and wider bandwidth is required with the lower and lower cost. The demand of operation turns out to be diversified. Data, images, videos and other special transmission demands share the challenge and opportunity with the service providers. Simultaneously, the electrical equipment has approached their limit. So the optical communication based on the wavelength division multiplexing (WDM) and the optical cross-connects (OXCs) shows great potentials and brilliant future to build an optical network based on the unique technical advantage and multi-wavelength characteristic. In this paper, we propose a multi-layered graph model with inter-path between layers to solve the problem of multicast routing wavelength assignment (RWA) contemporarily by employing an efficient graph theoretic formulation. And at the same time, an efficient dynamic multicast algorithm named Distributed Message Copying Multicast (DMCM) mechanism is also proposed. The multicast tree with minimum hops can be constructed dynamically according to this proposed scheme.
NASA Astrophysics Data System (ADS)
Bogiatzis, P.; Ishii, M.; Davis, T. A.
2016-12-01
Seismic tomography inverse problems are among the largest high-dimensional parameter estimation tasks in Earth science. We show how combinatorics and graph theory can be used to analyze the structure of such problems, and to effectively decompose them into smaller ones that can be solved efficiently by means of the least squares method. In combination with recent high performance direct sparse algorithms, this reduction in dimensionality allows for an efficient computation of the model resolution and covariance matrices using limited resources. Furthermore, we show that a new sparse singular value decomposition method can be used to obtain the complete spectrum of the singular values. This procedure provides the means for more objective regularization and further dimensionality reduction of the problem. We apply this methodology to a moderate size, non-linear seismic tomography problem to image the structure of the crust and the upper mantle beneath Japan using local deep earthquakes recorded by the High Sensitivity Seismograph Network stations.
Zhang, Pan; Moore, Cristopher
2014-01-01
Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ‘‘communities’’ in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods. PMID:25489096
Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric
2013-06-01
Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph--a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer's disease (AD) and reveal findings that could lead to advancements in AD research.
Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric
2014-01-01
Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph (DAG)—a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer’s disease (AD) and reveal findings that could lead to advancements in AD research. PMID:22665720
EvoGraph: On-The-Fly Efficient Mining of Evolving Graphs on GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Song, Shuaiwen
With the prevalence of the World Wide Web and social networks, there has been a growing interest in high performance analytics for constantly-evolving dynamic graphs. Modern GPUs provide massive AQ1 amount of parallelism for efficient graph processing, but the challenges remain due to their lack of support for the near real-time streaming nature of dynamic graphs. Specifically, due to the current high volume and velocity of graph data combined with the complexity of user queries, traditional processing methods by first storing the updates and then repeatedly running static graph analytics on a sequence of versions or snapshots are deemed undesirablemore » and computational infeasible on GPU. We present EvoGraph, a highly efficient and scalable GPU- based dynamic graph analytics framework.« less
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory.
Purvine, Emilie; Monson, Kyle; Jurrus, Elizabeth; Star, Keith; Baker, Nathan A
2016-08-25
There are several applications in computational biophysics that require the optimization of discrete interacting states, for example, amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of "maximum flow-minimum cut" graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered.
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Purvine, Emilie AH; Monson, Kyle E.; Jurrus, Elizabeth R.
There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of maximum flow-minimum cut graph analysis. The interaction energy graph, a graph in which verticesmore » (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial-time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered.« less
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory
Purvine, Emilie; Monson, Kyle; Jurrus, Elizabeth; Star, Keith; Baker, Nathan A.
2016-01-01
There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of “maximum flow-minimum cut” graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial-time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered. PMID:27089174
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hagberg, Aric; Swart, Pieter; S Chult, Daniel
NetworkX is a Python language package for exploration and analysis of networks and network algorithms. The core package provides data structures for representing many types of networks, or graphs, including simple graphs, directed graphs, and graphs with parallel edges and self loops. The nodes in NetworkX graphs can be any (hashable) Python object and edges can contain arbitrary data; this flexibility mades NetworkX ideal for representing networks found in many different scientific fields. In addition to the basic data structures many graph algorithms are implemented for calculating network properties and structure measures: shortest paths, betweenness centrality, clustering, and degree distributionmore » and many more. NetworkX can read and write various graph formats for eash exchange with existing data, and provides generators for many classic graphs and popular graph models, such as the Erdoes-Renyi, Small World, and Barabasi-Albert models, are included. The ease-of-use and flexibility of the Python programming language together with connection to the SciPy tools make NetworkX a powerful tool for scientific computations. We discuss some of our recent work studying synchronization of coupled oscillators to demonstrate how NetworkX enables research in the field of computational networks.« less
Multi-A Graph Patrolling and Partitioning
NASA Astrophysics Data System (ADS)
Elor, Y.; Bruckstein, A. M.
2012-12-01
We introduce a novel multi agent patrolling algorithm inspired by the behavior of gas filled balloons. Very low capability ant-like agents are considered with the task of patrolling an unknown area modeled as a graph. While executing the proposed algorithm, the agents dynamically partition the graph between them using simple local interactions, every agent assuming the responsibility for patrolling his subgraph. Balanced graph partition is an emergent behavior due to the local interactions between the agents in the swarm. Extensive simulations on various graphs (environments) showed that the average time to reach a balanced partition is linear with the graph size. The simulations yielded a convincing argument for conjecturing that if the graph being patrolled contains a balanced partition, the agents will find it. However, we could not prove this. Nevertheless, we have proved that if a balanced partition is reached, the maximum time lag between two successive visits to any vertex using the proposed strategy is at most twice the optimal so the patrol quality is at least half the optimal. In case of weighted graphs the patrol quality is at least (1)/(2){lmin}/{lmax} of the optimal where lmax (lmin) is the longest (shortest) edge in the graph.
Comparing Algorithms for Graph Isomorphism Using Discrete- and Continuous-Time Quantum Random Walks
Rudinger, Kenneth; Gamble, John King; Bach, Eric; ...
2013-07-01
Berry and Wang [Phys. Rev. A 83, 042317 (2011)] show numerically that a discrete-time quan- tum random walk of two noninteracting particles is able to distinguish some non-isomorphic strongly regular graphs from the same family. Here we analytically demonstrate how it is possible for these walks to distinguish such graphs, while continuous-time quantum walks of two noninteracting parti- cles cannot. We show analytically and numerically that even single-particle discrete-time quantum random walks can distinguish some strongly regular graphs, though not as many as two-particle noninteracting discrete-time walks. Additionally, we demonstrate how, given the same quantum random walk, subtle di erencesmore » in the graph certi cate construction algorithm can nontrivially im- pact the walk's distinguishing power. We also show that no continuous-time walk of a xed number of particles can distinguish all strongly regular graphs when used in conjunction with any of the graph certi cates we consider. We extend this constraint to discrete-time walks of xed numbers of noninteracting particles for one kind of graph certi cate; it remains an open question as to whether or not this constraint applies to the other graph certi cates we consider.« less
Compression of next-generation sequencing reads aided by highly efficient de novo assembly
Jones, Daniel C.; Ruzzo, Walter L.; Peng, Xinxia
2012-01-01
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. PMID:22904078
Parallel processing optimization strategy based on MapReduce model in cloud storage environment
NASA Astrophysics Data System (ADS)
Cui, Jianming; Liu, Jiayi; Li, Qiuyan
2017-05-01
Currently, a large number of documents in the cloud storage process employed the way of packaging after receiving all the packets. From the local transmitter this stored procedure to the server, packing and unpacking will consume a lot of time, and the transmission efficiency is low as well. A new parallel processing algorithm is proposed to optimize the transmission mode. According to the operation machine graphs model work, using MPI technology parallel execution Mapper and Reducer mechanism. It is good to use MPI technology to implement Mapper and Reducer parallel mechanism. After the simulation experiment of Hadoop cloud computing platform, this algorithm can not only accelerate the file transfer rate, but also shorten the waiting time of the Reducer mechanism. It will break through traditional sequential transmission constraints and reduce the storage coupling to improve the transmission efficiency.
Tensor Spectral Clustering for Partitioning Higher-order Network Structures.
Benson, Austin R; Gleich, David F; Leskovec, Jure
2015-01-01
Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.
Tensor Spectral Clustering for Partitioning Higher-order Network Structures
Benson, Austin R.; Gleich, David F.; Leskovec, Jure
2016-01-01
Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms. PMID:27812399
Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice
Aberer, Andre J.; Krompass, Denis; Stamatakis, Alexandros
2013-01-01
Abstract The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees. PMID:22962004
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.
Aberer, Andre J; Krompass, Denis; Stamatakis, Alexandros
2013-01-01
The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.
A novel swarm intelligence algorithm for finding DNA motifs.
Lei, Chengwei; Ruan, Jianhua
2009-01-01
Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.
Short paths in expander graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleinberg, J.; Rubinfeld, R.
Graph expansion has proved to be a powerful general tool for analyzing the behavior of routing algorithms and the interconnection networks on which they run. We develop new routing algorithms and structural results for bounded-degree expander graphs. Our results are unified by the fact that they are all based upon, and extend, a body of work asserting that expanders are rich in short, disjoint paths. In particular, our work has consequences for the disjoint paths problem, multicommodify flow, and graph minor containment. We show: (i) A greedy algorithm for approximating the maximum disjoint paths problem achieves a polylogarithmic approximation ratiomore » in bounded-degree expanders. Although our algorithm is both deterministic and on-line, its performance guarantee is an improvement over previous bounds in expanders. (ii) For a multicommodily flow problem with arbitrary demands on a bounded-degree expander, there is a (1 + {epsilon})-optimal solution using only flow paths of polylogarithmic length. It follows that the multicommodity flow algorithm of Awerbuch and Leighton runs in nearly linear time per commodity in expanders. Our analysis is based on establishing the following: given edge weights on an expander G, one can increase some of the weights very slightly so the resulting shortest-path metric is smooth - the min-weight path between any pair of nodes uses a polylogarithmic number of edges. (iii) Every bounded-degree expander on n nodes contains every graph with O(n/log{sup O(1)} n) nodes and edges as a minor.« less
An efficient hybrid approach for multiobjective optimization of water distribution systems
NASA Astrophysics Data System (ADS)
Zheng, Feifei; Simpson, Angus R.; Zecchin, Aaron C.
2014-05-01
An efficient hybrid approach for the design of water distribution systems (WDSs) with multiple objectives is described in this paper. The objectives are the minimization of the network cost and maximization of the network resilience. A self-adaptive multiobjective differential evolution (SAMODE) algorithm has been developed, in which control parameters are automatically adapted by means of evolution instead of the presetting of fine-tuned parameter values. In the proposed method, a graph algorithm is first used to decompose a looped WDS into a shortest-distance tree (T) or forest, and chords (Ω). The original two-objective optimization problem is then approximated by a series of single-objective optimization problems of the T to be solved by nonlinear programming (NLP), thereby providing an approximate Pareto optimal front for the original whole network. Finally, the solutions at the approximate front are used to seed the SAMODE algorithm to find an improved front for the original entire network. The proposed approach is compared with two other conventional full-search optimization methods (the SAMODE algorithm and the NSGA-II) that seed the initial population with purely random solutions based on three case studies: a benchmark network and two real-world networks with multiple demand loading cases. Results show that (i) the proposed NLP-SAMODE method consistently generates better-quality Pareto fronts than the full-search methods with significantly improved efficiency; and (ii) the proposed SAMODE algorithm (no parameter tuning) exhibits better performance than the NSGA-II with calibrated parameter values in efficiently offering optimal fronts.
- and Graph-Based Point Cloud Segmentation of 3d Scenes Using Perceptual Grouping Laws
NASA Astrophysics Data System (ADS)
Xu, Y.; Hoegner, L.; Tuttas, S.; Stilla, U.
2017-05-01
Segmentation is the fundamental step for recognizing and extracting objects from point clouds of 3D scene. In this paper, we present a strategy for point cloud segmentation using voxel structure and graph-based clustering with perceptual grouping laws, which allows a learning-free and completely automatic but parametric solution for segmenting 3D point cloud. To speak precisely, two segmentation methods utilizing voxel and supervoxel structures are reported and tested. The voxel-based data structure can increase efficiency and robustness of the segmentation process, suppressing the negative effect of noise, outliers, and uneven points densities. The clustering of voxels and supervoxel is carried out using graph theory on the basis of the local contextual information, which commonly conducted utilizing merely pairwise information in conventional clustering algorithms. By the use of perceptual laws, our method conducts the segmentation in a pure geometric way avoiding the use of RGB color and intensity information, so that it can be applied to more general applications. Experiments using different datasets have demonstrated that our proposed methods can achieve good results, especially for complex scenes and nonplanar surfaces of objects. Quantitative comparisons between our methods and other representative segmentation methods also confirms the effectiveness and efficiency of our proposals.
Guturu, Parthasarathy; Dantu, Ram
2008-06-01
Many graph- and set-theoretic problems, because of their tremendous application potential and theoretical appeal, have been well investigated by the researchers in complexity theory and were found to be NP-hard. Since the combinatorial complexity of these problems does not permit exhaustive searches for optimal solutions, only near-optimal solutions can be explored using either various problem-specific heuristic strategies or metaheuristic global-optimization methods, such as simulated annealing, genetic algorithms, etc. In this paper, we propose a unified evolutionary algorithm (EA) to the problems of maximum clique finding, maximum independent set, minimum vertex cover, subgraph and double subgraph isomorphism, set packing, set partitioning, and set cover. In the proposed approach, we first map these problems onto the maximum clique-finding problem (MCP), which is later solved using an evolutionary strategy. The proposed impatient EA with probabilistic tabu search (IEA-PTS) for the MCP integrates the best features of earlier successful approaches with a number of new heuristics that we developed to yield a performance that advances the state of the art in EAs for the exploration of the maximum cliques in a graph. Results of experimentation with the 37 DIMACS benchmark graphs and comparative analyses with six state-of-the-art algorithms, including two from the smaller EA community and four from the larger metaheuristics community, indicate that the IEA-PTS outperforms the EAs with respect to a Pareto-lexicographic ranking criterion and offers competitive performance on some graph instances when individually compared to the other heuristic algorithms. It has also successfully set a new benchmark on one graph instance. On another benchmark suite called Benchmarks with Hidden Optimal Solutions, IEA-PTS ranks second, after a very recent algorithm called COVER, among its peers that have experimented with this suite.
Detecting independent and recurrent copy number aberrations using interval graphs.
Wu, Hsin-Ta; Hajirasouliha, Iman; Raphael, Benjamin J
2014-06-15
Somatic copy number aberrations SCNAS: are frequent in cancer genomes, but many of these are random, passenger events. A common strategy to distinguish functional aberrations from passengers is to identify those aberrations that are recurrent across multiple samples. However, the extensive variability in the length and position of SCNA: s makes the problem of identifying recurrent aberrations notoriously difficult. We introduce a combinatorial approach to the problem of identifying independent and recurrent SCNA: s, focusing on the key challenging of separating the overlaps in aberrations across individuals into independent events. We derive independent and recurrent SCNA: s as maximal cliques in an interval graph constructed from overlaps between aberrations. We efficiently enumerate all such cliques, and derive a dynamic programming algorithm to find an optimal selection of non-overlapping cliques, resulting in a very fast algorithm, which we call RAIG (Recurrent Aberrations from Interval Graphs). We show that RAIG outperforms other methods on simulated data and also performs well on data from three cancer types from The Cancer Genome Atlas (TCGA). In contrast to existing approaches that employ various heuristics to select independent aberrations, RAIG optimizes a well-defined objective function. We show that this allows RAIG to identify rare aberrations that are likely functional, but are obscured by overlaps with larger passenger aberrations. http://compbio.cs.brown.edu/software. © The Author 2014. Published by Oxford University Press.
ERIC Educational Resources Information Center
Gültepe, Nejla
2016-01-01
Graphing subjects in chemistry has been used to provide alternatives to verbal and algorithmic descriptions of a subject by handing students another way of improving their manipulation of concepts. Teachers should therefore know the level of students' graphing skills. Studies have identified that students have difficulty making connections with…
NASA Astrophysics Data System (ADS)
Mandrà, Salvatore; Giacomo Guerreschi, Gian; Aspuru-Guzik, Alán
2016-07-01
We present an exact quantum algorithm for solving the Exact Satisfiability problem, which belongs to the important NP-complete complexity class. The algorithm is based on an intuitive approach that can be divided into two parts: the first step consists in the identification and efficient characterization of a restricted subspace that contains all the valid assignments of the Exact Satisfiability; while the second part performs a quantum search in such restricted subspace. The quantum algorithm can be used either to find a valid assignment (or to certify that no solution exists) or to count the total number of valid assignments. The query complexities for the worst-case are respectively bounded by O(\\sqrt{{2}n-{M\\prime }}) and O({2}n-{M\\prime }), where n is the number of variables and {M}\\prime the number of linearly independent clauses. Remarkably, the proposed quantum algorithm results to be faster than any known exact classical algorithm to solve dense formulas of Exact Satisfiability. As a concrete application, we provide the worst-case complexity for the Hamiltonian cycle problem obtained after mapping it to a suitable Occupation problem. Specifically, we show that the time complexity for the proposed quantum algorithm is bounded by O({2}n/4) for 3-regular undirected graphs, where n is the number of nodes. The same worst-case complexity holds for (3,3)-regular bipartite graphs. As a reference, the current best classical algorithm has a (worst-case) running time bounded by O({2}31n/96). Finally, when compared to heuristic techniques for Exact Satisfiability problems, the proposed quantum algorithm is faster than the classical WalkSAT and Adiabatic Quantum Optimization for random instances with a density of constraints close to the satisfiability threshold, the regime in which instances are typically the hardest to solve. The proposed quantum algorithm can be straightforwardly extended to the generalized version of the Exact Satisfiability known as Occupation problem. The general version of the algorithm is presented and analyzed.
Regularized Embedded Multiple Kernel Dimensionality Reduction for Mine Signal Processing.
Li, Shuang; Liu, Bing; Zhang, Chen
2016-01-01
Traditional multiple kernel dimensionality reduction models are generally based on graph embedding and manifold assumption. But such assumption might be invalid for some high-dimensional or sparse data due to the curse of dimensionality, which has a negative influence on the performance of multiple kernel learning. In addition, some models might be ill-posed if the rank of matrices in their objective functions was not high enough. To address these issues, we extend the traditional graph embedding framework and propose a novel regularized embedded multiple kernel dimensionality reduction method. Different from the conventional convex relaxation technique, the proposed algorithm directly takes advantage of a binary search and an alternative optimization scheme to obtain optimal solutions efficiently. The experimental results demonstrate the effectiveness of the proposed method for supervised, unsupervised, and semisupervised scenarios.
A Scalable Nonuniform Pointer Analysis for Embedded Program
NASA Technical Reports Server (NTRS)
Venet, Arnaud
2004-01-01
In this paper we present a scalable pointer analysis for embedded applications that is able to distinguish between instances of recursively defined data structures and elements of arrays. The main contribution consists of an efficient yet precise algorithm that can handle multithreaded programs. We first perform an inexpensive flow-sensitive analysis of each function in the program that generates semantic equations describing the effect of the function on the memory graph. These equations bear numerical constraints that describe nonuniform points-to relationships. We then iteratively solve these equations in order to obtain an abstract storage graph that describes the shape of data structures at every point of the program for all possible thread interleavings. We bring experimental evidence that this approach is tractable and precise for real-size embedded applications.
Towards a Multiscale Approach to Cybersecurity Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Hui, Peter SY; Choudhury, Sutanay
2013-11-12
We propose a multiscale approach to modeling cyber networks, with the goal of capturing a view of the network and overall situational awareness with respect to a few key properties--- connectivity, distance, and centrality--- for a system under an active attack. We focus on theoretical and algorithmic foundations of multiscale graphs, coming from an algorithmic perspective, with the goal of modeling cyber system defense as a specific use case scenario. We first define a notion of \\emph{multiscale} graphs, in contrast with their well-studied single-scale counterparts. We develop multiscale analogs of paths and distance metrics. As a simple, motivating example ofmore » a common metric, we present a multiscale analog of the all-pairs shortest-path problem, along with a multiscale analog of a well-known algorithm which solves it. From a cyber defense perspective, this metric might be used to model the distance from an attacker's position in the network to a sensitive machine. In addition, we investigate probabilistic models of connectivity. These models exploit the hierarchy to quantify the likelihood that sensitive targets might be reachable from compromised nodes. We believe that our novel multiscale approach to modeling cyber-physical systems will advance several aspects of cyber defense, specifically allowing for a more efficient and agile approach to defending these systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elmagarmid, A.K.
The availability of distributed data bases is directly affected by the timely detection and resolution of deadlocks. Consequently, mechanisms are needed to make deadlock detection algorithms resilient to failures. Presented first is a centralized algorithm that allows transactions to have multiple requests outstanding. Next, a new distributed deadlock detection algorithm (DDDA) is presented, using a global detector (GD) to detect global deadlocks and local detectors (LDs) to detect local deadlocks. This algorithm essentially identifies transaction-resource interactions that m cause global (multisite) deadlocks. Third, a deadlock detection algorithm utilizing a transaction-wait-for (TWF) graph is presented. It is a fully disjoint algorithmmore » that allows multiple outstanding requests. The proposed algorithm can achieve improved overall performance by using multiple disjoint controllers coupled with the two-phase property while maintaining the simplicity of centralized schemes. Fourth, an algorithm that combines deadlock detection and avoidance is given. This algorithm uses concurrent transaction controllers and resource coordinators to achieve maximum distribution. The language of CSP is used to describe this algorithm. Finally, two efficient deadlock resolution protocols are given along with some guidelines to be used in choosing a transaction for abortion.« less
An improved hierarchical A * algorithm in the optimization of parking lots
NASA Astrophysics Data System (ADS)
Wang, Yong; Wu, Junjuan; Wang, Ying
2017-08-01
In the parking lot parking path optimization, the traditional evaluation index is the shortest distance as the best index and it does not consider the actual road conditions. Now, the introduction of a more practical evaluation index can not only simplify the hardware design of the boot system but also save the software overhead. Firstly, we establish the parking lot network graph RPCDV mathematical model and all nodes in the network is divided into two layers which were constructed using different evaluation function base on the improved hierarchical A * algorithm which improves the time optimal path search efficiency and search precision of the evaluation index. The final results show that for different sections of the program attribute parameter algorithm always faster the time to find the optimal path.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Efficient Maximum Likelihood Estimation for Pedigree Data with the Sum-Product Algorithm.
Engelhardt, Alexander; Rieger, Anna; Tresch, Achim; Mansmann, Ulrich
2016-01-01
We analyze data sets consisting of pedigrees with age at onset of colorectal cancer (CRC) as phenotype. The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood. We propose an improved EM algorithm by applying factor graphs and the sum-product algorithm in the E-step. This reduces the computational complexity from exponential to linear in the number of family members. Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster. We introduce a flexible and runtime-efficient tool for statistical inference in biomedical event data with latent variables that opens the door for advanced analyses of pedigree data. © 2017 S. Karger AG, Basel.
NASA Astrophysics Data System (ADS)
Li, Xiuming; Sun, Mei; Gao, Cuixia; Han, Dun; Wang, Minggang
2018-02-01
This paper presents the parametric modified limited penetrable visibility graph (PMLPVG) algorithm for constructing complex networks from time series. We modify the penetrable visibility criterion of limited penetrable visibility graph (LPVG) in order to improve the rationality of the original penetrable visibility and preserve the dynamic characteristics of the time series. The addition of view angle provides a new approach to characterize the dynamic structure of the time series that is invisible in the previous algorithm. The reliability of the PMLPVG algorithm is verified by applying it to three types of artificial data as well as the actual data of natural gas prices in different regions. The empirical results indicate that PMLPVG algorithm can distinguish the different time series from each other. Meanwhile, the analysis results of natural gas prices data using PMLPVG are consistent with the detrended fluctuation analysis (DFA). The results imply that the PMLPVG algorithm may be a reasonable and significant tool for identifying various time series in different fields.
Ring system-based chemical graph generation for de novo molecular design
NASA Astrophysics Data System (ADS)
Miyao, Tomoyuki; Kaneko, Hiromasa; Funatsu, Kimito
2016-05-01
Generating chemical graphs in silico by combining building blocks is important and fundamental in virtual combinatorial chemistry. A premise in this area is that generated structures should be irredundant as well as exhaustive. In this study, we develop structure generation algorithms regarding combining ring systems as well as atom fragments. The proposed algorithms consist of three parts. First, chemical structures are generated through a canonical construction path. During structure generation, ring systems can be treated as reduced graphs having fewer vertices than those in the original ones. Second, diversified structures are generated by a simple rule-based generation algorithm. Third, the number of structures to be generated can be estimated with adequate accuracy without actual exhaustive generation. The proposed algorithms were implemented in structure generator Molgilla. As a practical application, Molgilla generated chemical structures mimicking rosiglitazone in terms of a two dimensional pharmacophore pattern. The strength of the algorithms lies in simplicity and flexibility. Therefore, they may be applied to various computer programs regarding structure generation by combining building blocks.
A Partitioning Algorithm for Block-Diagonal Matrices With Overlap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guy Antoine Atenekeng Kahou; Laura Grigori; Masha Sosonkina
2008-02-02
We present a graph partitioning algorithm that aims at partitioning a sparse matrix into a block-diagonal form, such that any two consecutive blocks overlap. We denote this form of the matrix as the overlapped block-diagonal matrix. The partitioned matrix is suitable for applying the explicit formulation of Multiplicative Schwarz preconditioner (EFMS) described in [3]. The graph partitioning algorithm partitions the graph of the input matrix into K partitions, such that every partition {Omega}{sub i} has at most two neighbors {Omega}{sub i-1} and {Omega}{sub i+1}. First, an ordering algorithm, such as the reverse Cuthill-McKee algorithm, that reduces the matrix profile ismore » performed. An initial overlapped block-diagonal partition is obtained from the profile of the matrix. An iterative strategy is then used to further refine the partitioning by allowing nodes to be transferred between neighboring partitions. Experiments are performed on matrices arising from real-world applications to show the feasibility and usefulness of this approach.« less
EmptyHeaded: A Relational Engine for Graph Processing
Aberger, Christopher R.; Tu, Susan; Olukotun, Kunle; Ré, Christopher
2016-01-01
There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and data layouts that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. PMID:28077912
Extracting Knowledge from Graph Data in Adversarial Settings
NASA Astrophysics Data System (ADS)
Skillicorn, David
Graph data captures connections and relationships among individuals, and between individuals and objects, places, and times. Because many of the properties f graphs are emergent, they are resistant to manipulation by adversaries. This robustness comes at the expense of more-complex analysis algorithms. We describe several approaches to analysing graph data, illustrating with examples from the relationships within al Qaeda.
Two Improved Algorithms for Envelope and Wavefront Reduction
NASA Technical Reports Server (NTRS)
Kumfert, Gary; Pothen, Alex
1997-01-01
Two algorithms for reordering sparse, symmetric matrices or undirected graphs to reduce envelope and wavefront are considered. The first is a combinatorial algorithm introduced by Sloan and further developed by Duff, Reid, and Scott; we describe enhancements to the Sloan algorithm that improve its quality and reduce its run time. Our test problems fall into two classes with differing asymptotic behavior of their envelope parameters as a function of the weights in the Sloan algorithm. We describe an efficient 0(nlogn + m) time implementation of the Sloan algorithm, where n is the number of rows (vertices), and m is the number of nonzeros (edges). On a collection of test problems, the improved Sloan algorithm required, on the average, only twice the time required by the simpler Reverse Cuthill-Mckee algorithm while improving the mean square wavefront by a factor of three. The second algorithm is a hybrid that combines a spectral algorithm for envelope and wavefront reduction with a refinement step that uses a modified Sloan algorithm. The hybrid algorithm reduces the envelope size and mean square wavefront obtained from the Sloan algorithm at the cost of greater running times. We illustrate how these reductions translate into tangible benefits for frontal Cholesky factorization and incomplete factorization preconditioning.
A Decentralized Eigenvalue Computation Method for Spectrum Sensing Based on Average Consensus
NASA Astrophysics Data System (ADS)
Mohammadi, Jafar; Limmer, Steffen; Stańczak, Sławomir
2016-07-01
This paper considers eigenvalue estimation for the decentralized inference problem for spectrum sensing. We propose a decentralized eigenvalue computation algorithm based on the power method, which is referred to as generalized power method GPM; it is capable of estimating the eigenvalues of a given covariance matrix under certain conditions. Furthermore, we have developed a decentralized implementation of GPM by splitting the iterative operations into local and global computation tasks. The global tasks require data exchange to be performed among the nodes. For this task, we apply an average consensus algorithm to efficiently perform the global computations. As a special case, we consider a structured graph that is a tree with clusters of nodes at its leaves. For an accelerated distributed implementation, we propose to use computation over multiple access channel (CoMAC) as a building block of the algorithm. Numerical simulations are provided to illustrate the performance of the two algorithms.
NASA Astrophysics Data System (ADS)
Li, Chen; Lu, Zhiqiang; Han, Xiaole; Zhang, Yuejun; Wang, Li
2016-03-01
The integrated scheduling of container handling systems aims to optimize the coordination and overall utilization of all handling equipment, so as to minimize the makespan of a given set of container tasks. A modified disjunctive graph is proposed and a mixed 0-1 programming model is formulated. A heuristic algorithm is presented, in which the original problem is divided into two subproblems. In the first subproblem, contiguous bay crane operations are applied to obtain a good quay crane schedule. In the second subproblem, proper internal truck and yard crane schedules are generated to match the given quay crane schedule. Furthermore, a genetic algorithm based on the heuristic algorithm is developed to search for better solutions. The computational results show that the proposed algorithm can efficiently find high-quality solutions. They also indicate the effectiveness of simultaneous loading and discharging operations compared with separate ones.
Yanagisawa, Keisuke; Komine, Shunta; Kubota, Rikuto; Ohue, Masahito; Akiyama, Yutaka
2018-06-01
The need to accelerate large-scale protein-ligand docking in virtual screening against a huge compound database led researchers to propose a strategy that entails memorizing the evaluation result of the partial structure of a compound and reusing it to evaluate other compounds. However, the previous method required frequent disk accesses, resulting in insufficient acceleration. Thus, more efficient memory usage can be expected to lead to further acceleration, and optimal memory usage could be achieved by solving the minimum cost flow problem. In this research, we propose a fast algorithm for the minimum cost flow problem utilizing the characteristics of the graph generated for this problem as constraints. The proposed algorithm, which optimized memory usage, was approximately seven times faster compared to existing minimum cost flow algorithms. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Efficient sequential and parallel algorithms for record linkage
Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar
2014-01-01
Background and objective Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Methods Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Results Our sequential and parallel algorithms have been tested on a real dataset of 1 083 878 records and synthetic datasets ranging in size from 50 000 to 9 000 000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). Conclusions We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm. PMID:24154837
Multiple sclerosis lesion segmentation using an automatic multimodal graph cuts.
García-Lorenzo, Daniel; Lecoeur, Jeremy; Arnold, Douglas L; Collins, D Louis; Barillot, Christian
2009-01-01
Graph Cuts have been shown as a powerful interactive segmentation technique in several medical domains. We propose to automate the Graph Cuts in order to automatically segment Multiple Sclerosis (MS) lesions in MRI. We replace the manual interaction with a robust EM-based approach in order to discriminate between MS lesions and the Normal Appearing Brain Tissues (NABT). Evaluation is performed in synthetic and real images showing good agreement between the automatic segmentation and the target segmentation. We compare our algorithm with the state of the art techniques and with several manual segmentations. An advantage of our algorithm over previously published ones is the possibility to semi-automatically improve the segmentation due to the Graph Cuts interactive feature.
Formal language constrained path problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, C.; Jacob, R.; Marathe, M.
1997-07-08
In many path finding problems arising in practice, certain patterns of edge/vertex labels in the labeled graph being traversed are allowed/preferred, while others are disallowed. Motivated by such applications as intermodal transportation planning, the authors investigate the complexity of finding feasible paths in a labeled network, where the mode choice for each traveler is specified by a formal language. The main contributions of this paper include the following: (1) the authors show that the problem of finding a shortest path between a source and destination for a traveler whose mode choice is specified as a context free language is solvablemore » efficiently in polynomial time, when the mode choice is specified as a regular language they provide algorithms with improved space and time bounds; (2) in contrast, they show that the problem of finding simple paths between a source and a given destination is NP-hard, even when restricted to very simple regular expressions and/or very simple graphs; (3) for the class of treewidth bounded graphs, they show that (i) the problem of finding a regular language constrained simple path between source and a destination is solvable in polynomial time and (ii) the extension to finding context free language constrained simple paths is NP-complete. Several extensions of these results are presented in the context of finding shortest paths with additional constraints. These results significantly extend the results in [MW95]. As a corollary of the results, they obtain a polynomial time algorithm for the BEST k-SIMILAR PATH problem studied in [SJB97]. The previous best algorithm was given by [SJB97] and takes exponential time in the worst case.« less
NASA Astrophysics Data System (ADS)
Mali, P.; Manna, S. K.; Mukhopadhyay, A.; Haldar, P. K.; Singh, G.
2018-03-01
Multiparticle emission data in nucleus-nucleus collisions are studied in a graph theoretical approach. The sandbox algorithm used to analyze complex networks is employed to characterize the multifractal properties of the visibility graphs associated with the pseudorapidity distribution of charged particles produced in high-energy heavy-ion collisions. Experimental data on 28Si+Ag/Br interaction at laboratory energy Elab = 14 . 5 A GeV, and 16O+Ag/Br and 32S+Ag/Br interactions both at Elab = 200 A GeV, are used in this analysis. We observe a scale free nature of the degree distributions of the visibility and horizontal visibility graphs associated with the event-wise pseudorapidity distributions. Equivalent event samples simulated by ultra-relativistic quantum molecular dynamics, produce degree distributions that are almost identical to the respective experiment. However, the multifractal variables obtained by using sandbox algorithm for the experiment to some extent differ from the respective simulated results.
NASA Astrophysics Data System (ADS)
Chen, Jung-Chieh
This paper presents a low complexity algorithmic framework for finding a broadcasting schedule in a low-altitude satellite system, i. e., the satellite broadcast scheduling (SBS) problem, based on the recent modeling and computational methodology of factor graphs. Inspired by the huge success of the low density parity check (LDPC) codes in the field of error control coding, in this paper, we transform the SBS problem into an LDPC-like problem through a factor graph instead of using the conventional neural network approaches to solve the SBS problem. Based on a factor graph framework, the soft-information, describing the probability that each satellite will broadcast information to a terminal at a specific time slot, is exchanged among the local processing in the proposed framework via the sum-product algorithm to iteratively optimize the satellite broadcasting schedule. Numerical results show that the proposed approach not only can obtain optimal solution but also enjoys the low complexity suitable for integral-circuit implementation.
An efficient algorithm for pairwise local alignment of protein interaction networks
Chen, Wenbin; Schmidt, Matthew; Tian, Wenhong; ...
2015-04-01
Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression. In this paper, we formulate the problem of identifying these conserved patterns as a graph optimization problem, and develop a fast heuristic algorithm for this problem. We compare the performance of our network alignment algorithm to that of the MaWISh algorithm [Koyuturk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A, Pairwise alignment of protein interaction networks, J Computmore » Biol 13(2): 182-199, 2006.], which bases its search algorithm on a related decision problem formulation. We find that our algorithm discovers conserved modules with a larger number of proteins in an order of magnitude less time. In conclusion, the protein sets found by our algorithm correspond to known conserved functional modules at comparable precision and recall rates as those produced by the MaWISh algorithm.« less
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Applied Graph-Mining Algorithms to Study Biomolecular Interaction Networks
2014-01-01
Protein-protein interaction (PPI) networks carry vital information on the organization of molecular interactions in cellular systems. The identification of functionally relevant modules in PPI networks is one of the most important applications of biological network analysis. Computational analysis is becoming an indispensable tool to understand large-scale biomolecular interaction networks. Several types of computational methods have been developed and employed for the analysis of PPI networks. Of these computational methods, graph comparison and module detection are the two most commonly used strategies. This review summarizes current literature on graph kernel and graph alignment methods for graph comparison strategies, as well as module detection approaches including seed-and-extend, hierarchical clustering, optimization-based, probabilistic, and frequent subgraph methods. Herein, we provide a comprehensive review of the major algorithms employed under each theme, including our recently published frequent subgraph method, for detecting functional modules commonly shared across multiple cancer PPI networks. PMID:24800226
Fault-tolerant dynamic task graph scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurt, Mehmet C.; Krishnamoorthy, Sriram; Agrawal, Kunal
2014-11-16
In this paper, we present an approach to fault tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. We elicit from the user the basic task graph structure in terms of successor and predecessor relationships. The work stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and meta-data associated with a task get corrupted. We use this redundancy, and the knowledge of the task graph structure, to selectively recover from faults with low space andmore » time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.« less
Graph-drawing algorithms geometries versus molecular mechanics in fullereness
NASA Astrophysics Data System (ADS)
Kaufman, M.; Pisanski, T.; Lukman, D.; Borštnik, B.; Graovac, A.
1996-09-01
The algorithms of Kamada-Kawai (KK) and Fruchterman-Reingold (FR) have been recently generalized (Pisanski et al., Croat. Chem. Acta 68 (1995) 283) in order to draw molecular graphs in three-dimensional space. The quality of KK and FR geometries is studied here by comparing them with the molecular mechanics (MM) and the adjacency matrix eigenvectors (AME) algorithm geometries. In order to compare different layouts of the same molecule, an appropriate method has been developed. Its application to a series of experimentally detected fullerenes indicates that the KK, FR and AME algorithms are able to reproduce plausible molecular geometries.
Finding Maximum Cliques on the D-Wave Quantum Annealer
Chapuis, Guillaume; Djidjev, Hristo; Hahn, Georg; ...
2018-05-03
This work assesses the performance of the D-Wave 2X (DW) quantum annealer for finding a maximum clique in a graph, one of the most fundamental and important NP-hard problems. Because the size of the largest graphs DW can directly solve is quite small (usually around 45 vertices), we also consider decomposition algorithms intended for larger graphs and analyze their performance. For smaller graphs that fit DW, we provide formulations of the maximum clique problem as a quadratic unconstrained binary optimization (QUBO) problem, which is one of the two input types (together with the Ising model) acceptable by the machine, andmore » compare several quantum implementations to current classical algorithms such as simulated annealing, Gurobi, and third-party clique finding heuristics. We further estimate the contributions of the quantum phase of the quantum annealer and the classical post-processing phase typically used to enhance each solution returned by DW. We demonstrate that on random graphs that fit DW, no quantum speedup can be observed compared with the classical algorithms. On the other hand, for instances specifically designed to fit well the DW qubit interconnection network, we observe substantial speed-ups in computing time over classical approaches.« less
Finding Maximum Cliques on the D-Wave Quantum Annealer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapuis, Guillaume; Djidjev, Hristo; Hahn, Georg
This work assesses the performance of the D-Wave 2X (DW) quantum annealer for finding a maximum clique in a graph, one of the most fundamental and important NP-hard problems. Because the size of the largest graphs DW can directly solve is quite small (usually around 45 vertices), we also consider decomposition algorithms intended for larger graphs and analyze their performance. For smaller graphs that fit DW, we provide formulations of the maximum clique problem as a quadratic unconstrained binary optimization (QUBO) problem, which is one of the two input types (together with the Ising model) acceptable by the machine, andmore » compare several quantum implementations to current classical algorithms such as simulated annealing, Gurobi, and third-party clique finding heuristics. We further estimate the contributions of the quantum phase of the quantum annealer and the classical post-processing phase typically used to enhance each solution returned by DW. We demonstrate that on random graphs that fit DW, no quantum speedup can be observed compared with the classical algorithms. On the other hand, for instances specifically designed to fit well the DW qubit interconnection network, we observe substantial speed-ups in computing time over classical approaches.« less
Coloring geographical threshold graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bradonjic, Milan; Percus, Allon; Muller, Tobias
We propose a coloring algorithm for sparse random graphs generated by the geographical threshold graph (GTG) model, a generalization of random geometric graphs (RGG). In a GTG, nodes are distributed in a Euclidean space, and edges are assigned according to a threshold function involving the distance between nodes as well as randomly chosen node weights. The motivation for analyzing this model is that many real networks (e.g., wireless networks, the Internet, etc.) need to be studied by using a 'richer' stochastic model (which in this case includes both a distance between nodes and weights on the nodes). Here, we analyzemore » the GTG coloring algorithm together with the graph's clique number, showing formally that in spite of the differences in structure between GTG and RGG, the asymptotic behavior of the chromatic number is identical: {chi}1n 1n n / 1n n (1 + {omicron}(1)). Finally, we consider the leading corrections to this expression, again using the coloring algorithm and clique number to provide bounds on the chromatic number. We show that the gap between the lower and upper bound is within C 1n n / (1n 1n n){sup 2}, and specify the constant C.« less
Hybrid services efficient provisioning over the network coding-enabled elastic optical networks
NASA Astrophysics Data System (ADS)
Wang, Xin; Gu, Rentao; Ji, Yuefeng; Kavehrad, Mohsen
2017-03-01
As a variety of services have emerged, hybrid services have become more common in real optical networks. Although the elastic spectrum resource optimizations over the elastic optical networks (EONs) have been widely investigated, little research has been carried out on the hybrid services of the routing and spectrum allocation (RSA), especially over the network coding-enabled EON. We investigated the RSA for the unicast service and network coding-based multicast service over the network coding-enabled EON with the constraints of time delay and transmission distance. To address this issue, a mathematical model was built to minimize the total spectrum consumption for the hybrid services over the network coding-enabled EON under the constraints of time delay and transmission distance. The model guarantees different routing constraints for different types of services. The immediate nodes over the network coding-enabled EON are assumed to be capable of encoding the flows for different kinds of information. We proposed an efficient heuristic algorithm of the network coding-based adaptive routing and layered graph-based spectrum allocation algorithm (NCAR-LGSA). From the simulation results, NCAR-LGSA shows highly efficient performances in terms of the spectrum resources utilization under different network scenarios compared with the benchmark algorithms.
Claw-Free Maximal Planar Graphs
1989-01-01
1976, 212-223. 110] M.D. Plummer, On n-extendable graphs, Discrete Math . 31, 1980, 201-210. 1111 , A theorem on matchings in the plane, Graph Theory...in Memory of G.A. Dirac, Ann. Discrete Math . 41, North-Holland, Amsterdam, 1989, 347-354. 1121 N. Sbihi, Algorithme de recherche d’un stable de...cardinalitA maximum dans un graphe sans 6toile, Discrete Math . 29, 1980, 53-76. 1131 D. Sumner, On Tutte’s factorization theorem, Graphs and Combinatorics
Efficient computation paths for the systematic analysis of sensitivities
NASA Astrophysics Data System (ADS)
Greppi, Paolo; Arato, Elisabetta
2013-01-01
A systematic sensitivity analysis requires computing the model on all points of a multi-dimensional grid covering the domain of interest, defined by the ranges of variability of the inputs. The issues to efficiently perform such analyses on algebraic models are handling solution failures within and close to the feasible region and minimizing the total iteration count. Scanning the domain in the obvious order is sub-optimal in terms of total iterations and is likely to cause many solution failures. The problem of choosing a better order can be translated geometrically into finding Hamiltonian paths on certain grid graphs. This work proposes two paths, one based on a mixed-radix Gray code and the other, a quasi-spiral path, produced by a novel heuristic algorithm. Some simple, easy-to-visualize examples are presented, followed by performance results for the quasi-spiral algorithm and the practical application of the different paths in a process simulation tool.
Exact analytical solution of irreversible binary dynamics on networks.
Laurence, Edward; Young, Jean-Gabriel; Melnik, Sergey; Dubé, Louis J
2018-03-01
In binary cascade dynamics, the nodes of a graph are in one of two possible states (inactive, active), and nodes in the inactive state make an irreversible transition to the active state, as soon as their precursors satisfy a predetermined condition. We introduce a set of recursive equations to compute the probability of reaching any final state, given an initial state, and a specification of the transition probability function of each node. Because the naive recursive approach for solving these equations takes factorial time in the number of nodes, we also introduce an accelerated algorithm, built around a breath-first search procedure. This algorithm solves the equations as efficiently as possible in exponential time.
Exact analytical solution of irreversible binary dynamics on networks
NASA Astrophysics Data System (ADS)
Laurence, Edward; Young, Jean-Gabriel; Melnik, Sergey; Dubé, Louis J.
2018-03-01
In binary cascade dynamics, the nodes of a graph are in one of two possible states (inactive, active), and nodes in the inactive state make an irreversible transition to the active state, as soon as their precursors satisfy a predetermined condition. We introduce a set of recursive equations to compute the probability of reaching any final state, given an initial state, and a specification of the transition probability function of each node. Because the naive recursive approach for solving these equations takes factorial time in the number of nodes, we also introduce an accelerated algorithm, built around a breath-first search procedure. This algorithm solves the equations as efficiently as possible in exponential time.
Graph Theory Roots of Spatial Operators for Kinematics and Dynamics
NASA Technical Reports Server (NTRS)
Jain, Abhinandan
2011-01-01
Spatial operators have been used to analyze the dynamics of robotic multibody systems and to develop novel computational dynamics algorithms. Mass matrix factorization, inversion, diagonalization, and linearization are among several new insights obtained using such operators. While initially developed for serial rigid body manipulators, the spatial operators and the related mathematical analysis have been shown to extend very broadly including to tree and closed topology systems, to systems with flexible joints, links, etc. This work uses concepts from graph theory to explore the mathematical foundations of spatial operators. The goal is to study and characterize the properties of the spatial operators at an abstract level so that they can be applied to a broader range of dynamics problems. The rich mathematical properties of the kinematics and dynamics of robotic multibody systems has been an area of strong research interest for several decades. These properties are important to understand the inherent physical behavior of systems, for stability and control analysis, for the development of computational algorithms, and for model development of faithful models. Recurring patterns in spatial operators leads one to ask the more abstract question about the properties and characteristics of spatial operators that make them so broadly applicable. The idea is to step back from the specific application systems, and understand more deeply the generic requirements and properties of spatial operators, so that the insights and techniques are readily available across different kinematics and dynamics problems. In this work, techniques from graph theory were used to explore the abstract basis for the spatial operators. The close relationship between the mathematical properties of adjacency matrices for graphs and those of spatial operators and their kernels were established. The connections hold across very basic requirements on the system topology, the nature of the component bodies, the indexing schemes, etc. The relationship of the underlying structure is intimately connected with efficient, recursive computational algorithms. The results provide the foundational groundwork for a much broader look at the key problems in kinematics and dynamics. The properties of general graphs and trees of nodes and edge were examined, as well as the properties of adjacency matrices that are used to describe graph connectivity. The nilpotency property of such matrices for directed trees was reviewed, and the adjacency matrices were generalized to the notion of block weighted adjacency matrices that support block matrix elements. This leads us to the development of the notion of Spatial Kernel Operator SKO kernels. These kernels provide the basis for the development of SKO resolvent operators.
Fast Decentralized Averaging via Multi-scale Gossip
NASA Astrophysics Data System (ADS)
Tsianos, Konstantinos I.; Rabbat, Michael G.
We are interested in the problem of computing the average consensus in a distributed fashion on random geometric graphs. We describe a new algorithm called Multi-scale Gossip which employs a hierarchical decomposition of the graph to partition the computation into tractable sub-problems. Using only pairwise messages of fixed size that travel at most O(n^{1/3}) hops, our algorithm is robust and has communication cost of O(n loglogn logɛ - 1) transmissions, which is order-optimal up to the logarithmic factor in n. Simulated experiments verify the good expected performance on graphs of many thousands of nodes.
Iterative cross section sequence graph for handwritten character segmentation.
Dawoud, Amer
2007-08-01
The iterative cross section sequence graph (ICSSG) is an algorithm for handwritten character segmentation. It expands the cross section sequence graph concept by applying it iteratively at equally spaced thresholds. The iterative thresholding reduces the effect of information loss associated with image binarization. ICSSG preserves the characters' skeletal structure by preventing the interference of pixels that causes flooding of adjacent characters' segments. Improving the structural quality of the characters' skeleton facilitates better feature extraction and classification, which improves the overall performance of optical character recognition (OCR). Experimental results showed significant improvements in OCR recognition rates compared to other well-established segmentation algorithms.
A graph-Laplacian-based feature extraction algorithm for neural spike sorting.
Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos
2009-01-01
Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.
Multi-INT Complex Event Processing using Approximate, Incremental Graph Pattern Search
2012-06-01
graph pattern search and SPARQL queries . Total execution time for 10 executions each of 5 random pattern searches in synthetic data sets...01/11 1000 10000 100000 RDF triples Time (secs) 10 20 Graph pattern algorithm SPARQL queries Initial Performance Comparisons 09/18/11 2011 Thrust Area
Diagnostic and Remedial Learning Strategy Based on Conceptual Graphs
ERIC Educational Resources Information Center
Jong, BinShyan; Lin, TsongWuu; Wu, YuLung; Chan, Teyi
2004-01-01
Numerous scholars have applied conceptual graphs for explanatory purposes. This study devised the Remedial-Instruction Decisive path (RID path) algorithm for diagnosing individual student learning situation. This study focuses on conceptual graphs. According to the concepts learned by students and the weight values of relations among these…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.
2009-08-01
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisionsmore » are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.« less
Wu, Guangsheng; Liu, Juan; Wang, Caihua
2017-12-28
Prediction of drug-disease interactions is promising for either drug repositioning or disease treatment fields. The discovery of novel drug-disease interactions, on one hand can help to find novel indictions for the approved drugs; on the other hand can provide new therapeutic approaches for the diseases. Recently, computational methods for finding drug-disease interactions have attracted lots of attention because of their far more higher efficiency and lower cost than the traditional wet experiment methods. However, they still face several challenges, such as the organization of the heterogeneous data, the performance of the model, and so on. In this work, we present to hierarchically integrate the heterogeneous data into three layers. The drug-drug and disease-disease similarities are first calculated separately in each layer, and then the similarities from three layers are linearly fused into comprehensive drug similarities and disease similarities, which can then be used to measure the similarities between two drug-disease pairs. We construct a novel weighted drug-disease pair network, where a node is a drug-disease pair with known or unknown treatment relation, an edge represents the node-node relation which is weighted with the similarity score between two pairs. Now that similar drug-disease pairs are supposed to show similar treatment patterns, we can find the optimal graph cut of the network. The drug-disease pair with unknown relation can then be considered to have similar treatment relation with that within the same cut. Therefore, we develop a semi-supervised graph cut algorithm, SSGC, to find the optimal graph cut, based on which we can identify the potential drug-disease treatment interactions. By comparing with three representative network-based methods, SSGC achieves the highest performances, in terms of both AUC score and the identification rates of true drug-disease pairs. The experiments with different integration strategies also demonstrate that considering several sources of data can improve the performances of the predictors. Further case studies on four diseases, the top-ranked drug-disease associations have been confirmed by KEGG, CTD database and the literature, illustrating the usefulness of SSGC. The proposed comprehensive similarity scores from multi-views and multiple layers and the graph-cut based algorithm can greatly improve the prediction performances of drug-disease associations.
Energy-efficient algorithm for broadcasting in ad hoc wireless sensor networks.
Xiong, Naixue; Huang, Xingbo; Cheng, Hongju; Wan, Zheng
2013-04-12
Broadcasting is a common and basic operation used to support various network protocols in wireless networks. To achieve energy-efficient broadcasting is especially important for ad hoc wireless sensor networks because sensors are generally powered by batteries with limited lifetimes. Energy consumption for broadcast operations can be reduced by minimizing the number of relay nodes based on the observation that data transmission processes consume more energy than data reception processes in the sensor nodes, and how to improve the network lifetime is always an interesting issue in sensor network research. The minimum-energy broadcast problem is then equivalent to the problem of finding the minimum Connected Dominating Set (CDS) for a connected graph that is proved NP-complete. In this paper, we introduce an Efficient Minimum CDS algorithm (EMCDS) with help of a proposed ordered sequence list. EMCDS does not concern itself with node energy and broadcast operations might fail if relay nodes are out of energy. Next we have proposed a Minimum Energy-consumption Broadcast Scheme (MEBS) with a modified version of EMCDS, and aimed at providing an efficient scheduling scheme with maximized network lifetime. The simulation results show that the proposed EMCDS algorithm can find smaller CDS compared with related works, and the MEBS can help to increase the network lifetime by efficiently balancing energy among nodes in the networks.
Double-tick realization of binary control program
NASA Astrophysics Data System (ADS)
Kobylecki, Michał; Kania, Dariusz
2016-12-01
This paper presents a procedure for the implementation of control algorithms for hardware-bit compatible with the standard IEC61131-3. The described transformation based on the sets of calculus and graphs, allows translation of the original form of the control program to the form in full compliance with the original, giving the architecture represented by two tick. The proposed method enables the efficient implementation of the control bits in the FPGA with the use of a standardized programming language LD.
Root Gravitropism: Quantification, Challenges, and Solutions.
Muller, Lukas; Bennett, Malcolm J; French, Andy; Wells, Darren M; Swarup, Ranjan
2018-01-01
Better understanding of root traits such as root angle and root gravitropism will be crucial for development of crops with improved resource use efficiency. This chapter describes a high-throughput, automated image analysis method to trace Arabidopsis (Arabidopsis thaliana) seedling roots grown on agar plates. The method combines a "particle-filtering algorithm with a graph-based method" to trace the center line of a root and can be adopted for the analysis of several root parameters such as length, curvature, and stimulus from original root traces.
Efficient Estimation of Mutual Information for Strongly Dependent Variables
2015-05-11
the two possibilities: for a fixed dimension d and near- est neighbor parameter k, we find a constant ↵ k,d , such that if V̄ (i)/V (i) < ↵ k,d , then...also compare the results to several baseline estima- tors: KSG (Kraskov et al., 2004), generalized near- est neighbor graph (GNN) (Pál et al., 2010...Amaury Lendasse, and Francesco Corona. A boundary corrected expansion of the moments of near- est neighbor distributions. Random Struct. Algorithms
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks
NASA Astrophysics Data System (ADS)
Esfandiar, Pooya; Bonchi, Francesco; Gleich, David F.; Greif, Chen; Lakshmanan, Laks V. S.; On, Byung-Won
Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.
Fast katz and commuters : efficient estimation of social relatedness in large networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
On, Byung-Won; Lakshmanan, Laks V. S.; Greif, Chen
Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and amore » quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.« less
Quantum many-body effects in x-ray spectra efficiently computed using a basic graph algorithm
NASA Astrophysics Data System (ADS)
Liang, Yufeng; Prendergast, David
2018-05-01
The growing interest in using x-ray spectroscopy for refined materials characterization calls for an accurate electronic-structure theory to interpret the x-ray near-edge fine structure. In this work, we propose an efficient and unified framework to describe all the many-electron processes in a Fermi liquid after a sudden perturbation (such as a core hole). This problem has been visited by the Mahan-Noziéres-De Dominicis (MND) theory, but it is intractable to implement various Feynman diagrams within first-principles calculations. Here, we adopt a nondiagrammatic approach and treat all the many-electron processes in the MND theory on an equal footing. Starting from a recently introduced determinant formalism [Phys. Rev. Lett. 118, 096402 (2017), 10.1103/PhysRevLett.118.096402], we exploit the linear dependence of determinants describing different final states involved in the spectral calculations. An elementary graph algorithm, breadth-first search, can be used to quickly identify the important determinants for shaping the spectrum, which avoids the need to evaluate a great number of vanishingly small terms. This search algorithm is performed over the tree-structure of the many-body expansion, which mimics a path-finding process. We demonstrate that the determinantal approach is computationally inexpensive even for obtaining x-ray spectra of extended systems. Using Kohn-Sham orbitals from two self-consistent fields (ground and core-excited state) as input for constructing the determinants, the calculated x-ray spectra for a number of transition metal oxides are in good agreement with experiments. Many-electron aspects beyond the Bethe-Salpeter equation, as captured by this approach, are also discussed, such as shakeup excitations and many-body wave function overlap considered in Anderson's orthogonality catastrophe.
Extended phase graphs with anisotropic diffusion
NASA Astrophysics Data System (ADS)
Weigel, M.; Schwenk, S.; Kiselev, V. G.; Scheffler, K.; Hennig, J.
2010-08-01
The extended phase graph (EPG) calculus gives an elegant pictorial description of magnetization response in multi-pulse MR sequences. The use of the EPG calculus enables a high computational efficiency for the quantitation of echo intensities even for complex sequences with multiple refocusing pulses with arbitrary flip angles. In this work, the EPG concept dealing with RF pulses with arbitrary flip angles and phases is extended to account for anisotropic diffusion in the presence of arbitrary varying gradients. The diffusion effect can be expressed by specific diffusion weightings of individual magnetization pathways. This can be represented as an action of a linear operator on the magnetization state. The algorithm allows easy integration of diffusion anisotropy effects. The formalism is validated on known examples from literature and used to calculate the effective diffusion weighting in multi-echo sequences with arbitrary refocusing flip angles.
Simulator for heterogeneous dataflow architectures
NASA Technical Reports Server (NTRS)
Malekpour, Mahyar R.
1993-01-01
A new simulator is developed to simulate the execution of an algorithm graph in accordance with the Algorithm to Architecture Mapping Model (ATAMM) rules. ATAMM is a Petri Net model which describes the periodic execution of large-grained, data-independent dataflow graphs and which provides predictable steady state time-optimized performance. This simulator extends the ATAMM simulation capability from a heterogenous set of resources, or functional units, to a more general heterogenous architecture. Simulation test cases show that the simulator accurately executes the ATAMM rules for both a heterogenous architecture and a homogenous architecture, which is the special case for only one processor type. The simulator forms one tool in an ATAMM Integrated Environment which contains other tools for graph entry, graph modification for performance optimization, and playback of simulations for analysis.
The Roadmaker's algorithm for the discrete pulse transform.
Laurie, Dirk P
2011-02-01
The discrete pulse transform (DPT) is a decomposition of an observed signal into a sum of pulses, i.e., signals that are constant on a connected set and zero elsewhere. Originally developed for 1-D signal processing, the DPT has recently been generalized to more dimensions. Applications in image processing are currently being investigated. The time required to compute the DPT as originally defined via the successive application of LULU operators (members of a class of minimax filters studied by Rohwer) has been a severe drawback to its applicability. This paper introduces a fast method for obtaining such a decomposition, called the Roadmaker's algorithm because it involves filling pits and razing bumps. It acts selectively only on those features actually present in the signal, flattening them in order of increasing size by subtracing an appropriate positive or negative pulse, which is then appended to the decomposition. The implementation described here covers 1-D signal as well as two and 3-D image processing in a single framework. This is achieved by considering the signal or image as a function defined on a graph, with the geometry specified by the edges of the graph. Whenever a feature is flattened, nodes in the graph are merged, until eventually only one node remains. At that stage, a new set of edges for the same nodes as the graph, forming a tree structure, defines the obtained decomposition. The Roadmaker's algorithm is shown to be equivalent to the DPT in the sense of obtaining the same decomposition. However, its simpler operators are not in general equivalent to the LULU operators in situations where those operators are not applied successively. A by-product of the Roadmaker's algorithm is that it yields a proof of the so-called Highlight Conjecture, stated as an open problem in 2006. We pay particular attention to algorithmic details and complexity, including a demonstration that in the 1-D case, and also in the case of a complete graph, the Roadmaker's algorithm has optimal complexity: it runs in time O(m), where m is the number of arcs in the graph.
Eigenvector synchronization, graph rigidity and the molecule problemR
Cucuringu, Mihai; Singer, Amit; Cowburn, David
2013-01-01
The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend the previous work and propose the 3D-As-Synchronized-As-Possible (3D-ASAP) algorithm, for the graph realization problem in ℝ3, given a sparse and noisy set of distance measurements. 3D-ASAP is a divide and conquer, non-incremental and non-iterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1-hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noise-free case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch, there corresponds an element of the Euclidean group, Euc(3), of rigid transformations in ℝ3, and the goal was to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3D-spectral-partitioning (SP)-ASAP, a faster version of 3D-ASAP, which uses a spectral partitioning algorithm as a pre-processing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3D-ASAP and 3D-SP-ASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably with similar state-of-the-art localization algorithms. PMID:24432187
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Quantum red-green-blue image steganography
NASA Astrophysics Data System (ADS)
Heidari, Shahrokh; Pourarian, Mohammad Rasoul; Gheibi, Reza; Naseri, Mosayeb; Houshmand, Monireh
One of the most considering matters in the field of quantum information processing is quantum data hiding including quantum steganography and quantum watermarking. This field is an efficient tool for protecting any kind of digital data. In this paper, three quantum color images steganography algorithms are investigated based on Least Significant Bit (LSB). The first algorithm employs only one of the image’s channels to cover secret data. The second procedure is based on LSB XORing technique, and the last algorithm utilizes two channels to cover the color image for hiding secret quantum data. The performances of the proposed schemes are analyzed by using software simulations in MATLAB environment. The analysis of PSNR, BER and Histogram graphs indicate that the presented schemes exhibit acceptable performances and also theoretical analysis demonstrates that the networks complexity of the approaches scales squarely.
Multiclass Data Segmentation Using Diffuse Interface Methods on Graphs
2014-01-01
interac- tive image segmentation using the solution to a combinatorial Dirichlet problem. Elmoataz et al . have developed general- izations of the graph...Laplacian [25] for image denoising and manifold smoothing. Couprie et al . in [18] define a conve- niently parameterized graph-based energy function that...over to the discrete graph representation. For general data segmentation, Bresson et al . in [8], present rigorous convergence results for two algorithms
Spatial Search by Quantum Walk is Optimal for Almost all Graphs.
Chakraborty, Shantanav; Novo, Leonardo; Ambainis, Andris; Omar, Yasser
2016-03-11
The problem of finding a marked node in a graph can be solved by the spatial search algorithm based on continuous-time quantum walks (CTQW). However, this algorithm is known to run in optimal time only for a handful of graphs. In this work, we prove that for Erdös-Renyi random graphs, i.e., graphs of n vertices where each edge exists with probability p, search by CTQW is almost surely optimal as long as p≥log^{3/2}(n)/n. Consequently, we show that quantum spatial search is in fact optimal for almost all graphs, meaning that the fraction of graphs of n vertices for which this optimality holds tends to one in the asymptotic limit. We obtain this result by proving that search is optimal on graphs where the ratio between the second largest and the largest eigenvalue is bounded by a constant smaller than 1. Finally, we show that we can extend our results on search to establish high fidelity quantum communication between two arbitrary nodes of a random network of interacting qubits, namely, to perform quantum state transfer, as well as entanglement generation. Our work shows that quantum information tasks typically designed for structured systems retain performance in very disordered structures.
Adaptation of Chain Event Graphs for use with Case-Control Studies in Epidemiology.
Keeble, Claire; Thwaites, Peter Adam; Barber, Stuart; Law, Graham Richard; Baxter, Paul David
2017-09-26
Case-control studies are used in epidemiology to try to uncover the causes of diseases, but are a retrospective study design known to suffer from non-participation and recall bias, which may explain their decreased popularity in recent years. Traditional analyses report usually only the odds ratio for given exposures and the binary disease status. Chain event graphs are a graphical representation of a statistical model derived from event trees which have been developed in artificial intelligence and statistics, and only recently introduced to the epidemiology literature. They are a modern Bayesian technique which enable prior knowledge to be incorporated into the data analysis using the agglomerative hierarchical clustering algorithm, used to form a suitable chain event graph. Additionally, they can account for missing data and be used to explore missingness mechanisms. Here we adapt the chain event graph framework to suit scenarios often encountered in case-control studies, to strengthen this study design which is time and financially efficient. We demonstrate eight adaptations to the graphs, which consist of two suitable for full case-control study analysis, four which can be used in interim analyses to explore biases, and two which aim to improve the ease and accuracy of analyses. The adaptations are illustrated with complete, reproducible, fully-interpreted examples, including the event tree and chain event graph. Chain event graphs are used here for the first time to summarise non-participation, data collection techniques, data reliability, and disease severity in case-control studies. We demonstrate how these features of a case-control study can be incorporated into the analysis to provide further insight, which can help to identify potential biases and lead to more accurate study results.
Visual Odometry Based on Structural Matching of Local Invariant Features Using Stereo Camera Sensor
Núñez, Pedro; Vázquez-Martín, Ricardo; Bandera, Antonio
2011-01-01
This paper describes a novel sensor system to estimate the motion of a stereo camera. Local invariant image features are matched between pairs of frames and linked into image trajectories at video rate, providing the so-called visual odometry, i.e., motion estimates from visual input alone. Our proposal conducts two matching sessions: the first one between sets of features associated to the images of the stereo pairs and the second one between sets of features associated to consecutive frames. With respect to previously proposed approaches, the main novelty of this proposal is that both matching algorithms are conducted by means of a fast matching algorithm which combines absolute and relative feature constraints. Finding the largest-valued set of mutually consistent matches is equivalent to finding the maximum-weighted clique on a graph. The stereo matching allows to represent the scene view as a graph which emerge from the features of the accepted clique. On the other hand, the frame-to-frame matching defines a graph whose vertices are features in 3D space. The efficiency of the approach is increased by minimizing the geometric and algebraic errors to estimate the final displacement of the stereo camera between consecutive acquired frames. The proposed approach has been tested for mobile robotics navigation purposes in real environments and using different features. Experimental results demonstrate the performance of the proposal, which could be applied in both industrial and service robot fields. PMID:22164016
Attributed relational graphs for cell nucleus segmentation in fluorescence microscopy images.
Arslan, Salim; Ersahin, Tulin; Cetin-Atalay, Rengul; Gunduz-Demir, Cigdem
2013-06-01
More rapid and accurate high-throughput screening in molecular cellular biology research has become possible with the development of automated microscopy imaging, for which cell nucleus segmentation commonly constitutes the core step. Although several promising methods exist for segmenting the nuclei of monolayer isolated and less-confluent cells, it still remains an open problem to segment the nuclei of more-confluent cells, which tend to grow in overlayers. To address this problem, we propose a new model-based nucleus segmentation algorithm. This algorithm models how a human locates a nucleus by identifying the nucleus boundaries and piecing them together. In this algorithm, we define four types of primitives to represent nucleus boundaries at different orientations and construct an attributed relational graph on the primitives to represent their spatial relations. Then, we reduce the nucleus identification problem to finding predefined structural patterns in the constructed graph and also use the primitives in region growing to delineate the nucleus borders. Working with fluorescence microscopy images, our experiments demonstrate that the proposed algorithm identifies nuclei better than previous nucleus segmentation algorithms.
Finding Hierarchical and Overlapping Dense Subgraphs using Nucleus Decompositions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seshadhri, Comandur; Pinar, Ali; Sariyuce, Ahmet Erdem
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to nd the \\true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine a hierarchical structure among them. Current dense subgraph nding algorithms usually optimize some objective, and only nd a few such subgraphs without providing any hierarchy. It is also not clear how to account formore » overlaps in dense substructures. We de ne the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which allows for discovery of overlapping dense subgraphs. With the right parameters, the nuclear decomposition generalizes the classic notions of k-cores and k-trusses. We give provable e cient algorithms for nuclear decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-theart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour.« less
Attribute-based Decision Graphs: A framework for multiclass data classification.
Bertini, João Roberto; Nicoletti, Maria do Carmo; Zhao, Liang
2017-01-01
Graph-based algorithms have been successfully applied in machine learning and data mining tasks. A simple but, widely used, approach to build graphs from vector-based data is to consider each data instance as a vertex and connecting pairs of it using a similarity measure. Although this abstraction presents some advantages, such as arbitrary shape representation of the original data, it is still tied to some drawbacks, for example, it is dependent on the choice of a pre-defined distance metric and is biased by the local information among data instances. Aiming at exploring alternative ways to build graphs from data, this paper proposes an algorithm for constructing a new type of graph, called Attribute-based Decision Graph-AbDG. Given a vector-based data set, an AbDG is built by partitioning each data attribute range into disjoint intervals and representing each interval as a vertex. The edges are then established between vertices from different attributes according to a pre-defined pattern. Classification is performed through a matching process among the attribute values of the new instance and AbDG. Moreover, AbDG provides an inner mechanism to handle missing attribute values, which contributes for expanding its applicability. Results of classification tasks have shown that AbDG is a competitive approach when compared to well-known multiclass algorithms. The main contribution of the proposed framework is the combination of the advantages of attribute-based and graph-based techniques to perform robust pattern matching data classification, while permitting the analysis the input data considering only a subset of its attributes. Copyright © 2016 Elsevier Ltd. All rights reserved.
Graph Laplacian Regularization for Image Denoising: Analysis in the Continuous Domain.
Pang, Jiahao; Cheung, Gene
2017-04-01
Inverse imaging problems are inherently underdetermined, and hence, it is important to employ appropriate image priors for regularization. One recent popular prior-the graph Laplacian regularizer-assumes that the target pixel patch is smooth with respect to an appropriately chosen graph. However, the mechanisms and implications of imposing the graph Laplacian regularizer on the original inverse problem are not well understood. To address this problem, in this paper, we interpret neighborhood graphs of pixel patches as discrete counterparts of Riemannian manifolds and perform analysis in the continuous domain, providing insights into several fundamental aspects of graph Laplacian regularization for image denoising. Specifically, we first show the convergence of the graph Laplacian regularizer to a continuous-domain functional, integrating a norm measured in a locally adaptive metric space. Focusing on image denoising, we derive an optimal metric space assuming non-local self-similarity of pixel patches, leading to an optimal graph Laplacian regularizer for denoising in the discrete domain. We then interpret graph Laplacian regularization as an anisotropic diffusion scheme to explain its behavior during iterations, e.g., its tendency to promote piecewise smooth signals under certain settings. To verify our analysis, an iterative image denoising algorithm is developed. Experimental results show that our algorithm performs competitively with state-of-the-art denoising methods, such as BM3D for natural images, and outperforms them significantly for piecewise smooth images.
GraphStore: A Distributed Graph Storage System for Big Data Networks
ERIC Educational Resources Information Center
Martha, VenkataSwamy
2013-01-01
Networks, such as social networks, are a universal solution for modeling complex problems in real time, especially in the Big Data community. While previous studies have attempted to enhance network processing algorithms, none have paved a path for the development of a persistent storage system. The proposed solution, GraphStore, provides an…
Exact numerical calculation of fixation probability and time on graphs.
Hindersin, Laura; Möller, Marius; Traulsen, Arne; Bauer, Benedikt
2016-12-01
The Moran process on graphs is a popular model to study the dynamics of evolution in a spatially structured population. Exact analytical solutions for the fixation probability and time of a new mutant have been found for only a few classes of graphs so far. Simulations are time-expensive and many realizations are necessary, as the variance of the fixation times is high. We present an algorithm that numerically computes these quantities for arbitrary small graphs by an approach based on the transition matrix. The advantage over simulations is that the calculation has to be executed only once. Building the transition matrix is automated by our algorithm. This enables a fast and interactive study of different graph structures and their effect on fixation probability and time. We provide a fast implementation in C with this note (Hindersin et al., 2016). Our code is very flexible, as it can handle two different update mechanisms (Birth-death or death-Birth), as well as arbitrary directed or undirected graphs. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A Novel Coarsening Method for Scalable and Efficient Mesh Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A; Hysom, D; Gunney, B
2010-12-02
In this paper, we propose a novel mesh coarsening method called brick coarsening method. The proposed method can be used in conjunction with any graph partitioners and scales to very large meshes. This method reduces problem space by decomposing the original mesh into fixed-size blocks of nodes called bricks, layered in a similar way to conventional brick laying, and then assigning each node of the original mesh to appropriate brick. Our experiments indicate that the proposed method scales to very large meshes while allowing simple RCB partitioner to produce higher-quality partitions with significantly less edge cuts. Our results further indicatemore » that the proposed brick-coarsening method allows more complicated partitioners like PT-Scotch to scale to very large problem size while still maintaining good partitioning performance with relatively good edge-cut metric. Graph partitioning is an important problem that has many scientific and engineering applications in such areas as VLSI design, scientific computing, and resource management. Given a graph G = (V,E), where V is the set of vertices and E is the set of edges, (k-way) graph partitioning problem is to partition the vertices of the graph (V) into k disjoint groups such that each group contains roughly equal number of vertices and the number of edges connecting vertices in different groups is minimized. Graph partitioning plays a key role in large scientific computing, especially in mesh-based computations, as it is used as a tool to minimize the volume of communication and to ensure well-balanced load across computing nodes. The impact of graph partitioning on the reduction of communication can be easily seen, for example, in different iterative methods to solve a sparse system of linear equation. Here, a graph partitioning technique is applied to the matrix, which is basically a graph in which each edge is a non-zero entry in the matrix, to allocate groups of vertices to processors in such a way that many of matrix-vector multiplication can be performed locally on each processor and hence to minimize communication. Furthermore, a good graph partitioning scheme ensures the equal amount of computation performed on each processor. Graph partitioning is a well known NP-complete problem, and thus the most commonly used graph partitioning algorithms employ some forms of heuristics. These algorithms vary in terms of their complexity, partition generation time, and the quality of partitions, and they tend to trade off these factors. A significant challenge we are currently facing at the Lawrence Livermore National Laboratory is how to partition very large meshes on massive-size distributed memory machines like IBM BlueGene/P, where scalability becomes a big issue. For example, we have found that the ParMetis, a very popular graph partitioning tool, can only scale to 16K processors. An ideal graph partitioning method on such an environment should be fast and scale to very large meshes, while producing high quality partitions. This is an extremely challenging task, as to scale to that level, the partitioning algorithm should be simple and be able to produce partitions that minimize inter-processor communications and balance the load imposed on the processors. Our goals in this work are two-fold: (1) To develop a new scalable graph partitioning method with good load balancing and communication reduction capability. (2) To study the performance of the proposed partitioning method on very large parallel machines using actual data sets and compare the performance to that of existing methods. The proposed method achieves the desired scalability by reducing the mesh size. For this, it coarsens an input mesh into a smaller size mesh by coalescing the vertices and edges of the original mesh into a set of mega-vertices and mega-edges. A new coarsening method called brick algorithm is developed in this research. In the brick algorithm, the zones in a given mesh are first grouped into fixed size blocks called bricks. These brick are then laid in a way similar to conventional brick laying technique, which reduces the number of neighboring blocks each block needs to communicate. Contributions of this research are as follows: (1) We have developed a novel method that scales to a really large problem size while producing high quality mesh partitions; (2) We measured the performance and scalability of the proposed method on a machine of massive size using a set of actual large complex data sets, where we have scaled to a mesh with 110 million zones using our method. To the best of our knowledge, this is the largest complex mesh that a partitioning method is successfully applied to; and (3) We have shown that proposed method can reduce the number of edge cuts by as much as 65%.« less
Kou, Qiang; Wu, Si; Tolic, Nikola; Paša-Tolic, Ljiljana; Liu, Yunlong; Liu, Xiaowen
2017-05-01
Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms. http://proteomics.informatics.iupui.edu/software/topmg/. xwliu@iupui.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Listing triangles in expected linear time on a class of power law graphs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nordman, Daniel J.; Wilson, Alyson G.; Phillips, Cynthia Ann
Enumerating triangles (3-cycles) in graphs is a kernel operation for social network analysis. For example, many community detection methods depend upon finding common neighbors of two related entities. We consider Cohen's simple and elegant solution for listing triangles: give each node a 'bucket.' Place each edge into the bucket of its endpoint of lowest degree, breaking ties consistently. Each node then checks each pair of edges in its bucket, testing for the adjacency that would complete that triangle. Cohen presents an informal argument that his algorithm should run well on real graphs. We formalize this argument by providing an analysismore » for the expected running time on a class of random graphs, including power law graphs. We consider a rigorously defined method for generating a random simple graph, the erased configuration model (ECM). In the ECM each node draws a degree independently from a marginal degree distribution, endpoints pair randomly, and we erase self loops and multiedges. If the marginal degree distribution has a finite second moment, it follows immediately that Cohen's algorithm runs in expected linear time. Furthermore, it can still run in expected linear time even when the degree distribution has such a heavy tail that the second moment is not finite. We prove that Cohen's algorithm runs in expected linear time when the marginal degree distribution has finite 4/3 moment and no vertex has degree larger than {radical}n. In fact we give the precise asymptotic value of the expected number of edge pairs per bucket. A finite 4/3 moment is required; if it is unbounded, then so is the number of pairs. The marginal degree distribution of a power law graph has bounded 4/3 moment when its exponent {alpha} is more than 7/3. Thus for this class of power law graphs, with degree at most {radical}n, Cohen's algorithm runs in expected linear time. This is precisely the value of {alpha} for which the clustering coefficient tends to zero asymptotically, and it is in the range that is relevant for the degree distribution of the World-Wide Web.« less
Efficient Synthesis of Graph Methods: a Dynamically Scheduled Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
RDF databases naturally map to a graph representation and employ languages, such as SPARQL, that implements queries as graph pattern matching routines. Graph methods exhibit an irregular behavior: they present unpredictable, fine-grained data accesses, and are synchronization inten- sive. Graph data structures expose large amounts of dy- namic parallelism, but are difficult to partition without gen- erating load unbalance. In this paper, we present a novel ar- chitecture to improve the synthesis of graph methods. Our design addresses the issues of these algorithms with two com- ponents: a Dynamic Task Scheduler (DTS), which reduces load unbalance and maximize resource utilization,more » and a Hi- erarchical Memory Interface controller (HMI), which pro- vides support for concurrent memory operations on multi- ported/multi-banked shared memories. We evaluate our ap- proach by generating the accelerators for a set of SPARQL queries from the Lehigh University Benchmark (LUBM). We first analyze the load unbalance of these queries, showing that execution time among tasks can differ even of order of magnitudes. We then synthesize the queries and com- pare the performance of the resulting accelerators against the current state of the art. Experimental results show that our solution provides a speedup over the serial implementa- tion close to the theoretical maximum and a speedup up to 3.45 over a baseline parallel implementation. We conclude our study by exploring the design space to achieve maximum memory channels utilization. The best design used at least three of the four memory channels for more than 90% of the execution time.« less
Design tool for multiprocessor scheduling and evaluation of iterative dataflow algorithms
NASA Technical Reports Server (NTRS)
Jones, Robert L., III
1995-01-01
A graph-theoretic design process and software tool is defined for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described with a dataflow graph and are intended to be executed repetitively on a set of identical processors. Typical applications include signal processing and control law problems. Graph-search algorithms and analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool applies the design process to a given problem and includes performance optimization through the inclusion of additional precedence constraints among the schedulable tasks.
Information Dynamics in Networks: Models and Algorithms
2016-09-13
Twitter ; we investigated how to detect spam accounts on Facebook and other social networks by graph analytics; and finally we investigated how to design...networks. We investigated the appropriateness of existing mathematical models for explaining the structure of retweet cascades on Twitter ; we investigated...Received Paper 1.00 2.00 3.00 . A Note on Modeling Retweet Cascades on Twitter , Workshop on Algorithms and Models for the Web Graph. 09-DEC-15
Task scheduling in dataflow computer architectures
NASA Technical Reports Server (NTRS)
Katsinis, Constantine
1994-01-01
Dataflow computers provide a platform for the solution of a large class of computational problems, which includes digital signal processing and image processing. Many typical applications are represented by a set of tasks which can be repetitively executed in parallel as specified by an associated dataflow graph. Research in this area aims to model these architectures, develop scheduling procedures, and predict the transient and steady state performance. Researchers at NASA have created a model and developed associated software tools which are capable of analyzing a dataflow graph and predicting its runtime performance under various resource and timing constraints. These models and tools were extended and used in this work. Experiments using these tools revealed certain properties of such graphs that require further study. Specifically, the transient behavior at the beginning of the execution of a graph can have a significant effect on the steady state performance. Transformation and retiming of the application algorithm and its initial conditions can produce a different transient behavior and consequently different steady state performance. The effect of such transformations on the resource requirements or under resource constraints requires extensive study. Task scheduling to obtain maximum performance (based on user-defined criteria), or to satisfy a set of resource constraints, can also be significantly affected by a transformation of the application algorithm. Since task scheduling is performed by heuristic algorithms, further research is needed to determine if new scheduling heuristics can be developed that can exploit such transformations. This work has provided the initial development for further long-term research efforts. A simulation tool was completed to provide insight into the transient and steady state execution of a dataflow graph. A set of scheduling algorithms was completed which can operate in conjunction with the modeling and performance tools previously developed. Initial studies on the performance of these algorithms were done to examine the effects of application algorithm transformations as measured by such quantities as number of processors, time between outputs, time between input and output, communication time, and memory size.
Domain-wall excitations in the two-dimensional Ising spin glass
NASA Astrophysics Data System (ADS)
Khoshbakht, Hamid; Weigel, Martin
2018-02-01
The Ising spin glass in two dimensions exhibits rich behavior with subtle differences in the scaling for different coupling distributions. We use recently developed mappings to graph-theoretic problems together with highly efficient implementations of combinatorial optimization algorithms to determine exact ground states for systems on square lattices with up to 10 000 ×10 000 spins. While these mappings only work for planar graphs, for example for systems with periodic boundary conditions in at most one direction, we suggest here an iterative windowing technique that allows one to determine ground states for fully periodic samples up to sizes similar to those for the open-periodic case. Based on these techniques, a large number of disorder samples are used together with a careful finite-size scaling analysis to determine the stiffness exponents and domain-wall fractal dimensions with unprecedented accuracy, our best estimates being θ =-0.2793 (3 ) and df=1.273 19 (9 ) for Gaussian couplings. For bimodal disorder, a new uniform sampling algorithm allows us to study the domain-wall fractal dimension, finding df=1.279 (2 ) . Additionally, we also investigate the distributions of ground-state energies, of domain-wall energies, and domain-wall lengths.
Metabolomic Analysis and Visualization Engine for LC–MS Data
Melamud, Eugene; Vastag, Livia; Rabinowitz, Joshua D.
2017-01-01
Metabolomic analysis by liquid chromatography–high-resolution mass spectrometry results in data sets with thousands of features arising from metabolites, fragments, isotopes, and adducts. Here we describe a software package, Metabolomic Analysis and Visualization ENgine (MAVEN), designed for efficient interactive analysis of LC–MS data, including in the presence of isotope labeling. The software contains tools for all aspects of the data analysis process, from feature extraction to pathway-based graphical data display. To facilitate data validation, a machine learning algorithm automatically assesses peak quality. Users interact with raw data primarily in the form of extracted ion chromatograms, which are displayed with overlaid circles indicating peak quality, and bar graphs of peak intensities for both unlabeled and isotope-labeled metabolite forms. Click-based navigation leads to additional information, such as raw data for specific isotopic forms or for metabolites changing significantly between conditions. Fast data processing algorithms result in nearly delay-free browsing. Drop-down menus provide tools for the overlay of data onto pathway maps. These tools enable animating series of pathway graphs, e.g., to show propagation of labeled forms through a metabolic network. MAVEN is released under an open source license at http://maven.princeton.edu. PMID:21049934
Basin Hopping Graph: a computational framework to characterize RNA folding landscapes
Kucharík, Marcel; Hofacker, Ivo L.; Stadler, Peter F.; Qin, Jing
2014-01-01
Motivation: RNA folding is a complicated kinetic process. The minimum free energy structure provides only a static view of the most stable conformational state of the system. It is insufficient to give detailed insights into the dynamic behavior of RNAs. A sufficiently sophisticated analysis of the folding free energy landscape, however, can provide the relevant information. Results: We introduce the Basin Hopping Graph (BHG) as a novel coarse-grained model of folding landscapes. Each vertex of the BHG is a local minimum, which represents the corresponding basin in the landscape. Its edges connect basins when the direct transitions between them are ‘energetically favorable’. Edge weights endcode the corresponding saddle heights and thus measure the difficulties of these favorable transitions. BHGs can be approximated accurately and efficiently for RNA molecules well beyond the length range accessible to enumerative algorithms. Availability and implementation: The algorithms described here are implemented in C++ as standalone programs. Its source code and supplemental material can be freely downloaded from http://www.tbi.univie.ac.at/bhg.html. Contact: qin@bioinf.uni-leipzig.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24648041
Graph theory applied to the analysis of motor activity in patients with schizophrenia and depression
Fasmer, Erlend Eindride; Berle, Jan Øystein; Oedegaard, Ketil J.; Hauge, Erik R.
2018-01-01
Depression and schizophrenia are defined only by their clinical features, and diagnostic separation between them can be difficult. Disturbances in motor activity pattern are central features of both types of disorders. We introduce a new method to analyze time series, called the similarity graph algorithm. Time series of motor activity, obtained from actigraph registrations over 12 days in depressed and schizophrenic patients, were mapped into a graph and we then applied techniques from graph theory to characterize these time series, primarily looking for changes in complexity. The most marked finding was that depressed patients were found to be significantly different from both controls and schizophrenic patients, with evidence of less regularity of the time series, when analyzing the recordings with one hour intervals. These findings support the contention that there are important differences in control systems regulating motor behavior in patients with depression and schizophrenia. The similarity graph algorithm we have described can easily be applied to the study of other types of time series. PMID:29668743
On the Parameterized Complexity of Some Optimization Problems Related to Multiple-Interval Graphs
NASA Astrophysics Data System (ADS)
Jiang, Minghui
We show that for any constant t ≥ 2, K -Independent Set and K-Dominating Set in t-track interval graphs are W[1]-hard. This settles an open question recently raised by Fellows, Hermelin, Rosamond, and Vialette. We also give an FPT algorithm for K-Clique in t-interval graphs, parameterized by both k and t, with running time max { t O(k), 2 O(klogk) } ·poly(n), where n is the number of vertices in the graph. This slightly improves the previous FPT algorithm by Fellows, Hermelin, Rosamond, and Vialette. Finally, we use the W[1]-hardness of K-Independent Set in t-track interval graphs to obtain the first parameterized intractability result for a recent bioinformatics problem called Maximal Strip Recovery (MSR). We show that MSR-d is W[1]-hard for any constant d ≥ 4 when the parameter is either the total length of the strips, or the total number of adjacencies in the strips, or the number of strips in the optimal solution.
A graph-based network-vulnerability analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, L.P.; Phillips, C.; Gaylor, T.
1998-05-03
This paper presents a graph based approach to network vulnerability analysis. The method is flexible, allowing analysis of attacks from both outside and inside the network. It can analyze risks to a specific network asset, or examine the universe of possible consequences following a successful attack. The analysis system requires as input a database of common attacks, broken into atomic steps, specific network configuration and topology information, and an attacker profile. The attack information is matched with the network configuration information and an attacker profile to create a superset attack graph. Nodes identify a stage of attack, for example themore » class of machines the attacker has accessed and the user privilege level he or she has compromised. The arcs in the attack graph represent attacks or stages of attacks. By assigning probabilities of success on the arcs or costs representing level of effort for the attacker, various graph algorithms such as shortest path algorithms can identify the attack paths with the highest probability of success.« less
A graph-based network-vulnerability analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swiler, L.P.; Phillips, C.; Gaylor, T.
1998-01-01
This report presents a graph-based approach to network vulnerability analysis. The method is flexible, allowing analysis of attacks from both outside and inside the network. It can analyze risks to a specific network asset, or examine the universe of possible consequences following a successful attack. The analysis system requires as input a database of common attacks, broken into atomic steps, specific network configuration and topology information, and an attacker profile. The attack information is matched with the network configuration information and an attacker profile to create a superset attack graph. Nodes identify a stage of attack, for example the classmore » of machines the attacker has accessed and the user privilege level he or she has compromised. The arcs in the attack graph represent attacks or stages of attacks. By assigning probabilities of success on the arcs or costs representing level-of-effort for the attacker, various graph algorithms such as shortest-path algorithms can identify the attack paths with the highest probability of success.« less
A Set of Handwriting Features for Use in Automated Writer Identification.
Miller, John J; Patterson, Robert Bradley; Gantz, Donald T; Saunders, Christopher P; Walch, Mark A; Buscaglia, JoAnn
2017-05-01
A writer's biometric identity can be characterized through the distribution of physical feature measurements ("writer's profile"); a graph-based system that facilitates the quantification of these features is described. To accomplish this quantification, handwriting is segmented into basic graphical forms ("graphemes"), which are "skeletonized" to yield the graphical topology of the handwritten segment. The graph-based matching algorithm compares the graphemes first by their graphical topology and then by their geometric features. Graphs derived from known writers can be compared against graphs extracted from unknown writings. The process is computationally intensive and relies heavily upon statistical pattern recognition algorithms. This article focuses on the quantification of these physical features and the construction of the associated pattern recognition methods for using the features to discriminate among writers. The graph-based system described in this article has been implemented in a highly accurate and approximately language-independent biometric recognition system of writers of cursive documents. © 2017 American Academy of Forensic Sciences.
Li, Bing; Yuan, Chunfeng; Xiong, Weihua; Hu, Weiming; Peng, Houwen; Ding, Xinmiao; Maybank, Steve
2017-12-01
In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (MIL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse -graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the MIL. Experiments and analyses in many practical applications prove the effectiveness of the M IL.
Fasmer, Erlend Eindride; Fasmer, Ole Bernt; Berle, Jan Øystein; Oedegaard, Ketil J; Hauge, Erik R
2018-01-01
Depression and schizophrenia are defined only by their clinical features, and diagnostic separation between them can be difficult. Disturbances in motor activity pattern are central features of both types of disorders. We introduce a new method to analyze time series, called the similarity graph algorithm. Time series of motor activity, obtained from actigraph registrations over 12 days in depressed and schizophrenic patients, were mapped into a graph and we then applied techniques from graph theory to characterize these time series, primarily looking for changes in complexity. The most marked finding was that depressed patients were found to be significantly different from both controls and schizophrenic patients, with evidence of less regularity of the time series, when analyzing the recordings with one hour intervals. These findings support the contention that there are important differences in control systems regulating motor behavior in patients with depression and schizophrenia. The similarity graph algorithm we have described can easily be applied to the study of other types of time series.
A distributed geo-routing algorithm for wireless sensor networks.
Joshi, Gyanendra Prasad; Kim, Sung Won
2009-01-01
Geographic wireless sensor networks use position information for greedy routing. Greedy routing works well in dense networks, whereas in sparse networks it may fail and require a recovery algorithm. Recovery algorithms help the packet to get out of the communication void. However, these algorithms are generally costly for resource constrained position-based wireless sensor networks (WSNs). In this paper, we propose a void avoidance algorithm (VAA), a novel idea based on upgrading virtual distance. VAA allows wireless sensor nodes to remove all stuck nodes by transforming the routing graph and forwarding packets using only greedy routing. In VAA, the stuck node upgrades distance unless it finds a next hop node that is closer to the destination than it is. VAA guarantees packet delivery if there is a topologically valid path. Further, it is completely distributed, immediately responds to node failure or topology changes and does not require planarization of the network. NS-2 is used to evaluate the performance and correctness of VAA and we compare its performance to other protocols. Simulations show our proposed algorithm consumes less energy, has an efficient path and substantially less control overheads.
Dynamic graph of an oxy-fuel combustion system using autocatalytic set model
NASA Astrophysics Data System (ADS)
Harish, Noor Ainy; Bakar, Sumarni Abu
2017-08-01
Evaporation process is one of the main processes besides combustion process in an oxy-combustion boiler system. An Autocatalytic Set (ASC) Model has successfully applied in developing graphical representation of the chemical reactions that occurs in the evaporation process in the system. Seventeen variables identified in the process are represented as nodes and the catalytic relationships are represented as edges in the graph. In addition, in this paper graph dynamics of ACS is further investigated. By using Dynamic Autocatalytic Set Graph Algorithm (DAGA), the adjacency matrix for each of the graphs and its relations to Perron-Frobenius Theorem is investigated. The dynamic graph obtained is further investigated where the connection of the graph to fuzzy graph Type 1 is established.
Graph pyramids for protein function prediction
2015-01-01
Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.
Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun
2015-01-01
Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
2012-01-01
Background Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2-L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations. Results The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm. Conclusions The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems. PMID:22551152
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.
Nasr Azadani, Mozhgan; Ghadiri, Nasser; Davoodijam, Ensieh
2018-06-12
Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain. Copyright © 2018. Published by Elsevier Inc.
Efficient dynamic graph construction for inductive semi-supervised learning.
Dornaika, F; Dahbi, R; Bosaghzadeh, A; Ruichek, Y
2017-10-01
Most of graph construction techniques assume a transductive setting in which the whole data collection is available at construction time. Addressing graph construction for inductive setting, in which data are coming sequentially, has received much less attention. For inductive settings, constructing the graph from scratch can be very time consuming. This paper introduces a generic framework that is able to make any graph construction method incremental. This framework yields an efficient and dynamic graph construction method that adds new samples (labeled or unlabeled) to a previously constructed graph. As a case study, we use the recently proposed Two Phase Weighted Regularized Least Square (TPWRLS) graph construction method. The paper has two main contributions. First, we use the TPWRLS coding scheme to represent new sample(s) with respect to an existing database. The representative coefficients are then used to update the graph affinity matrix. The proposed method not only appends the new samples to the graph but also updates the whole graph structure by discovering which nodes are affected by the introduction of new samples and by updating their edge weights. The second contribution of the article is the application of the proposed framework to the problem of graph-based label propagation using multiple observations for vision-based recognition tasks. Experiments on several image databases show that, without any significant loss in the accuracy of the final classification, the proposed dynamic graph construction is more efficient than the batch graph construction. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Constant-Factor Approximation Algorithm for the Link Building Problem
NASA Astrophysics Data System (ADS)
Olsen, Martin; Viglas, Anastasios; Zvedeniouk, Ilia
In this work we consider the problem of maximizing the PageRank of a given target node in a graph by adding k new links. We consider the case that the new links must point to the given target node (backlinks). Previous work [7] shows that this problem has no fully polynomial time approximation schemes unless P = NP. We present a polynomial time algorithm yielding a PageRank value within a constant factor from the optimal. We also consider the naive algorithm where we choose backlinks from nodes with high PageRank values compared to the outdegree and show that the naive algorithm performs much worse on certain graphs compared to the constant factor approximation scheme.
Local Higher-Order Graph Clustering
Yin, Hao; Benson, Austin R.; Leskovec, Jure; Gleich, David F.
2018-01-01
Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology. PMID:29770258
Decentralized Control of Scheduling in Distributed Systems.
1983-03-18
the job scheduling algorithm adapts to the changing busyness of the various hosts in the system. The environment in which the job scheduling entities...resources and processes that constitute the node and a set of interfaces for accessing these processes and resources. The structure of a node could change ...parallel. Chang [CHNG82] has also described some algorithms for detecting properties of general graphs by traversing paths in a graph in parallel. One of
Salient object detection: manifold-based similarity adaptation approach
NASA Astrophysics Data System (ADS)
Zhou, Jingbo; Ren, Yongfeng; Yan, Yunyang; Gao, Shangbing
2014-11-01
A saliency detection algorithm based on manifold-based similarity adaptation is proposed. The proposed algorithm is divided into three steps. First, we segment an input image into superpixels, which are represented as the nodes in a graph. Second, a new similarity measurement is used in the proposed algorithm. The weight matrix of the graph, which indicates the similarities between the nodes, uses a similarity-based method. It also captures the manifold structure of the image patches, in which the graph edges are determined in a data adaptive manner in terms of both similarity and manifold structure. Then, we use local reconstruction method as a diffusion method to obtain the saliency maps. The objective function in the proposed method is based on local reconstruction, with which estimated weights capture the manifold structure. Experiments on four bench-mark databases demonstrate the accuracy and robustness of the proposed method.
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
Graph Kernels for Molecular Similarity.
Rupp, Matthias; Schneider, Gisbert
2010-04-12
Molecular similarity measures are important for many cheminformatics applications like ligand-based virtual screening and quantitative structure-property relationships. Graph kernels are formal similarity measures defined directly on graphs, such as the (annotated) molecular structure graph. Graph kernels are positive semi-definite functions, i.e., they correspond to inner products. This property makes them suitable for use with kernel-based machine learning algorithms such as support vector machines and Gaussian processes. We review the major types of kernels between graphs (based on random walks, subgraphs, and optimal assignments, respectively), and discuss their advantages, limitations, and successful applications in cheminformatics. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Fully Decomposable Split Graphs
NASA Astrophysics Data System (ADS)
Broersma, Hajo; Kratsch, Dieter; Woeginger, Gerhard J.
We discuss various questions around partitioning a split graph into connected parts. Our main result is a polynomial time algorithm that decides whether a given split graph is fully decomposable, i.e., whether it can be partitioned into connected parts of order α 1,α 2,...,α k for every α 1,α 2,...,α k summing up to the order of the graph. In contrast, we show that the decision problem whether a given split graph can be partitioned into connected parts of order α 1,α 2,...,α k for a given partition α 1,α 2,...,α k of the order of the graph, is NP-hard.
Evolutionary dynamics on any population structure
NASA Astrophysics Data System (ADS)
Allen, Benjamin; Lippner, Gabor; Chen, Yu-Ting; Fotouhi, Babak; Momeni, Naghmeh; Yau, Shing-Tung; Nowak, Martin A.
2017-03-01
Evolution occurs in populations of reproducing individuals. The structure of a population can affect which traits evolve. Understanding evolutionary game dynamics in structured populations remains difficult. Mathematical results are known for special structures in which all individuals have the same number of neighbours. The general case, in which the number of neighbours can vary, has remained open. For arbitrary selection intensity, the problem is in a computational complexity class that suggests there is no efficient algorithm. Whether a simple solution for weak selection exists has remained unanswered. Here we provide a solution for weak selection that applies to any graph or network. Our method relies on calculating the coalescence times of random walks. We evaluate large numbers of diverse population structures for their propensity to favour cooperation. We study how small changes in population structure—graph surgery—affect evolutionary outcomes. We find that cooperation flourishes most in societies that are based on strong pairwise ties.