algorithm identifies community: Topics by Science.gov

Sample records for algorithm identifies community

SA-SOM algorithm for detecting communities in complex networks

NASA Astrophysics Data System (ADS)

Chen, Luogeng; Wang, Yanran; Huang, Xiaoming; Hu, Mengyu; Hu, Fang

2017-10-01

Currently, community detection is a hot topic. This paper, based on the self-organizing map (SOM) algorithm, introduced the idea of self-adaptation (SA) that the number of communities can be identified automatically, a novel algorithm SA-SOM of detecting communities in complex networks is proposed. Several representative real-world networks and a set of computer-generated networks by LFR-benchmark are utilized to verify the accuracy and the efficiency of this algorithm. The experimental findings demonstrate that this algorithm can identify the communities automatically, accurately and efficiently. Furthermore, this algorithm can also acquire higher values of modularity, NMI and density than the SOM algorithm does.
LPA-CBD an improved label propagation algorithm based on community belonging degree for community detection

NASA Astrophysics Data System (ADS)

Gui, Chun; Zhang, Ruisheng; Zhao, Zhili; Wei, Jiaxuan; Hu, Rongjing

In order to deal with stochasticity in center node selection and instability in community detection of label propagation algorithm, this paper proposes an improved label propagation algorithm named label propagation algorithm based on community belonging degree (LPA-CBD) that employs community belonging degree to determine the number and the center of community. The general process of LPA-CBD is that the initial community is identified by the nodes with the maximum degree, and then it is optimized or expanded by community belonging degree. After getting the rough structure of network community, the remaining nodes are labeled by using label propagation algorithm. The experimental results on 10 real-world networks and three synthetic networks show that LPA-CBD achieves reasonable community number, better algorithm accuracy and higher modularity compared with other four prominent algorithms. Moreover, the proposed algorithm not only has lower algorithm complexity and higher community detection quality, but also improves the stability of the original label propagation algorithm.
Multi-Objective Community Detection Based on Memetic Algorithm

PubMed Central

2015-01-01

Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels. PMID:25932646
Multi-objective community detection based on memetic algorithm.

PubMed

Wu, Peng; Pan, Li

2015-01-01

Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.
Community detection using preference networks

NASA Astrophysics Data System (ADS)

Tasgin, Mursel; Bingol, Haluk O.

2018-04-01

Community detection is the task of identifying clusters or groups of nodes in a network where nodes within the same group are more connected with each other than with nodes in different groups. It has practical uses in identifying similar functions or roles of nodes in many biological, social and computer networks. With the availability of very large networks in recent years, performance and scalability of community detection algorithms become crucial, i.e. if time complexity of an algorithm is high, it cannot run on large networks. In this paper, we propose a new community detection algorithm, which has a local approach and is able to run on large networks. It has a simple and effective method; given a network, algorithm constructs a preference network of nodes where each node has a single outgoing edge showing its preferred node to be in the same community with. In such a preference network, each connected component is a community. Selection of the preferred node is performed using similarity based metrics of nodes. We use two alternatives for this purpose which can be calculated in 1-neighborhood of nodes, i.e. number of common neighbors of selector node and its neighbors and, the spread capability of neighbors around the selector node which is calculated by the gossip algorithm of Lind et.al. Our algorithm is tested on both computer generated LFR networks and real-life networks with ground-truth community structure. It can identify communities accurately in a fast way. It is local, scalable and suitable for distributed execution on large networks.
Network Community Detection based on the Physarum-inspired Computational Framework.

PubMed

Gao, Chao; Liang, Mingxin; Li, Xianghua; Zhang, Zili; Wang, Zhen; Zhou, Zhili

2016-12-13

Community detection is a crucial and essential problem in the structure analytics of complex networks, which can help us understand and predict the characteristics and functions of complex networks. Many methods, ranging from the optimization-based algorithms to the heuristic-based algorithms, have been proposed for solving such a problem. Due to the inherent complexity of identifying network structure, how to design an effective algorithm with a higher accuracy and a lower computational cost still remains an open problem. Inspired by the computational capability and positive feedback mechanism in the wake of foraging process of Physarum, which is a large amoeba-like cell consisting of a dendritic network of tube-like pseudopodia, a general Physarum-based computational framework for community detection is proposed in this paper. Based on the proposed framework, the inter-community edges can be identified from the intra-community edges in a network and the positive feedback of solving process in an algorithm can be further enhanced, which are used to improve the efficiency of original optimization-based and heuristic-based community detection algorithms, respectively. Some typical algorithms (e.g., genetic algorithm, ant colony optimization algorithm, and Markov clustering algorithm) and real-world datasets have been used to estimate the efficiency of our proposed computational framework. Experiments show that the algorithms optimized by Physarum-inspired computational framework perform better than the original ones, in terms of accuracy and computational cost. Moreover, a computational complexity analysis verifies the scalability of our framework.
A spectral method to detect community structure based on distance modularity matrix

NASA Astrophysics Data System (ADS)

Yang, Jin-Xuan; Zhang, Xiao-Dong

2017-08-01

There are many community organizations in social and biological networks. How to identify these community structure in complex networks has become a hot issue. In this paper, an algorithm to detect community structure of networks is proposed by using spectra of distance modularity matrix. The proposed algorithm focuses on the distance of vertices within communities, rather than the most weakly connected vertex pairs or number of edges between communities. The experimental results show that our method achieves better effectiveness to identify community structure for a variety of real-world networks and computer generated networks with a little more time-consumption.
Distributed learning automata-based algorithm for community detection in complex networks

NASA Astrophysics Data System (ADS)

Khomami, Mohammad Mehdi Daliri; Rezvanian, Alireza; Meybodi, Mohammad Reza

2016-03-01

Community structure is an important and universal topological property of many complex networks such as social and information networks. The detection of communities of a network is a significant technique for understanding the structure and function of networks. In this paper, we propose an algorithm based on distributed learning automata for community detection (DLACD) in complex networks. In the proposed algorithm, each vertex of network is equipped with a learning automation. According to the cooperation among network of learning automata and updating action probabilities of each automaton, the algorithm interactively tries to identify high-density local communities. The performance of the proposed algorithm is investigated through a number of simulations on popular synthetic and real networks. Experimental results in comparison with popular community detection algorithms such as walk trap, Danon greedy optimization, Fuzzy community detection, Multi-resolution community detection and label propagation demonstrated the superiority of DLACD in terms of modularity, NMI, performance, min-max-cut and coverage.
Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks

NASA Astrophysics Data System (ADS)

Cui, Yaozu; Wang, Xingyuan; Eustace, Justine

2014-12-01

Community structure is a common phenomenon in complex networks, and it has been shown that some communities in complex networks often overlap each other. So in this paper we propose a new algorithm to detect overlapping community structure in complex networks. To identify the overlapping community structure, our algorithm firstly extracts fully connected sub-graphs which are maximal sub-graphs from original networks. Then two maximal sub-graphs having the key pair-vertices can be merged into a new larger sub-graph using some belonging degree functions. Furthermore we extend the modularity function to evaluate the proposed algorithm. In addition, overlapping nodes between communities are founded successfully. Finally we report the comparison between the modularity and the computational complexity of the proposed algorithm with some other existing algorithms. The experimental results show that the proposed algorithm gives satisfactory results.
A framework for detecting communities of unbalanced sizes in networks

NASA Astrophysics Data System (ADS)

Žalik, Krista Rizman; Žalik, Borut

2018-01-01

Community detection in large networks has been a focus of recent research in many of fields, including biology, physics, social sciences, and computer science. Most community detection methods partition the entire network into communities, groups of nodes that have many connections within communities and few connections between them and do not identify different roles that nodes can have in communities. We propose a community detection model that integrates more different measures that can fast identify communities of different sizes and densities. We use node degree centrality, strong similarity with one node from community, maximal similarity of node to community, compactness of communities and separation between communities. Each measure has its own strength and weakness. Thus, combining different measures can benefit from the strengths of each one and eliminate encountered problems of using an individual measure. We present a fast local expansion algorithm for uncovering communities of different sizes and densities and reveals rich information on input networks. Experimental results show that the proposed algorithm is better or as effective as the other community detection algorithms for both real-world and synthetic networks while it requires less time.
Billing code algorithms to identify cases of peripheral artery disease from administrative data

PubMed Central

Fan, Jin; Arruda-Olson, Adelaide M; Leibson, Cynthia L; Smith, Carin; Liu, Guanghui; Bailey, Kent R; Kullo, Iftikhar J

2013-01-01

Objective To construct and validate billing code algorithms for identifying patients with peripheral arterial disease (PAD). Methods We extracted all encounters and line item details including PAD-related billing codes at Mayo Clinic Rochester, Minnesota, between July 1, 1997 and June 30, 2008; 22 712 patients evaluated in the vascular laboratory were divided into training and validation sets. Multiple logistic regression analysis was used to create an integer code score from the training dataset, and this was tested in the validation set. We applied a model-based code algorithm to patients evaluated in the vascular laboratory and compared this with a simpler algorithm (presence of at least one of the ICD-9 PAD codes 440.20–440.29). We also applied both algorithms to a community-based sample (n=4420), followed by a manual review. Results The logistic regression model performed well in both training and validation datasets (c statistic=0.91). In patients evaluated in the vascular laboratory, the model-based code algorithm provided better negative predictive value. The simpler algorithm was reasonably accurate for identification of PAD status, with lesser sensitivity and greater specificity. In the community-based sample, the sensitivity (38.7% vs 68.0%) of the simpler algorithm was much lower, whereas the specificity (92.0% vs 87.6%) was higher than the model-based algorithm. Conclusions A model-based billing code algorithm had reasonable accuracy in identifying PAD cases from the community, and in patients referred to the non-invasive vascular laboratory. The simpler algorithm had reasonable accuracy for identification of PAD in patients referred to the vascular laboratory but was significantly less sensitive in a community-based sample. PMID:24166724
A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies

PubMed Central

2017-01-01

The authors use four criteria to examine a novel community detection algorithm: (a) effectiveness in terms of producing high values of normalized mutual information (NMI) and modularity, using well-known social networks for testing; (b) examination, meaning the ability to examine mitigating resolution limit problems using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks. In addition to describing a simple hierarchical arc-merging (HAM) algorithm that uses network topology information, we introduce rule-based arc-merging strategies for identifying community structures. Five well-studied social network datasets and eight sets of LFR benchmark networks were employed to validate the correctness of a ground-truth community, eight large-scale real-world complex networks were used to measure its efficiency, and two synthetic networks were used to determine its susceptibility to two resolution limit problems. Our experimental results indicate that the proposed HAM algorithm exhibited satisfactory performance efficiency, and that HAM-identified and ground-truth communities were comparable in terms of social and LFR benchmark networks, while mitigating resolution limit problems. PMID:29121100
Maximal Neighbor Similarity Reveals Real Communities in Networks

PubMed Central

Žalik, Krista Rizman

2015-01-01

An important problem in the analysis of network data is the detection of groups of densely interconnected nodes also called modules or communities. Community structure reveals functions and organizations of networks. Currently used algorithms for community detection in large-scale real-world networks are computationally expensive or require a priori information such as the number or sizes of communities or are not able to give the same resulting partition in multiple runs. In this paper we investigate a simple and fast algorithm that uses the network structure alone and requires neither optimization of pre-defined objective function nor information about number of communities. We propose a bottom up community detection algorithm in which starting from communities consisting of adjacent pairs of nodes and their maximal similar neighbors we find real communities. We show that the overall advantage of the proposed algorithm compared to the other community detection algorithms is its simple nature, low computational cost and its very high accuracy in detection communities of different sizes also in networks with blurred modularity structure consisting of poorly separated communities. All communities identified by the proposed method for facebook network and E-Coli transcriptional regulatory network have strong structural and functional coherence. PMID:26680448
Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels

NASA Astrophysics Data System (ADS)

Havemann, Frank; Heinz, Michael; Struck, Alexander; Gläser, Jochen

2011-01-01

We propose a new local, deterministic and parameter-free algorithm that detects fuzzy and crisp overlapping communities in a weighted network and simultaneously reveals their hierarchy. Using a local fitness function, the algorithm greedily expands natural communities of seeds until the whole graph is covered. The hierarchy of communities is obtained analytically by calculating resolution levels at which communities grow rather than numerically by testing different resolution levels. This analytic procedure is not only more exact than its numerical alternatives such as LFM and GCE but also much faster. Critical resolution levels can be identified by searching for intervals in which large changes of the resolution do not lead to growth of communities. We tested our algorithm on benchmark graphs and on a network of 492 papers in information science. Combined with a specific post-processing, the algorithm gives much more precise results on LFR benchmarks with high overlap compared to other algorithms and performs very similarly to GCE.
Constant Communities in Complex Networks

NASA Astrophysics Data System (ADS)

Chakraborty, Tanmoy; Srinivasan, Sriram; Ganguly, Niloy; Bhowmick, Sanjukta; Mukherjee, Animesh

2013-05-01

Identifying community structure is a fundamental problem in network analysis. Most community detection algorithms are based on optimizing a combinatorial parameter, for example modularity. This optimization is generally NP-hard, thus merely changing the vertex order can alter their assignments to the community. However, there has been less study on how vertex ordering influences the results of the community detection algorithms. Here we identify and study the properties of invariant groups of vertices (constant communities) whose assignment to communities are, quite remarkably, not affected by vertex ordering. The percentage of constant communities can vary across different applications and based on empirical results we propose metrics to evaluate these communities. Using constant communities as a pre-processing step, one can significantly reduce the variation of the results. Finally, we present a case study on phoneme network and illustrate that constant communities, quite strikingly, form the core functional units of the larger communities.
Fragmenting networks by targeting collective influencers at a mesoscopic level.

PubMed

Kobayashi, Teruyoshi; Masuda, Naoki

2016-11-25

A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.
Fragmenting networks by targeting collective influencers at a mesoscopic level

NASA Astrophysics Data System (ADS)

Kobayashi, Teruyoshi; Masuda, Naoki

2016-11-01

A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.
Fragmenting networks by targeting collective influencers at a mesoscopic level

PubMed Central

Kobayashi, Teruyoshi; Masuda, Naoki

2016-01-01

A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure. PMID:27886251
An ant colony based algorithm for overlapping community detection in complex networks

NASA Astrophysics Data System (ADS)

Zhou, Xu; Liu, Yanheng; Zhang, Jindong; Liu, Tuming; Zhang, Di

2015-06-01

Community detection is of great importance to understand the structures and functions of networks. Overlap is a significant feature of networks and overlapping community detection has attracted an increasing attention. Many algorithms have been presented to detect overlapping communities. In this paper, we present an ant colony based overlapping community detection algorithm which mainly includes ants' location initialization, ants' movement and post processing phases. An ants' location initialization strategy is designed to identify initial location of ants and initialize label list stored in each node. During the ants' movement phase, the entire ants move according to the transition probability matrix, and a new heuristic information computation approach is redefined to measure similarity between two nodes. Every node keeps a label list through the cooperation made by ants until a termination criterion is reached. A post processing phase is executed on the label list to get final overlapping community structure naturally. We illustrate the capability of our algorithm by making experiments on both synthetic networks and real world networks. The results demonstrate that our algorithm will have better performance in finding overlapping communities and overlapping nodes in synthetic datasets and real world datasets comparing with state-of-the-art algorithms.
An algorithm for designing minimal microbial communities with desired metabolic capacities

PubMed Central

Eng, Alexander; Borenstein, Elhanan

2016-01-01

Motivation: Recent efforts to manipulate various microbial communities, such as fecal microbiota transplant and bioreactor systems’ optimization, suggest a promising route for microbial community engineering with numerous medical, environmental and industrial applications. However, such applications are currently restricted in scale and often rely on mimicking or enhancing natural communities, calling for the development of tools for designing synthetic communities with specific, tailored, desired metabolic capacities. Results: Here, we present a first step toward this goal, introducing a novel algorithm for identifying minimal sets of microbial species that collectively provide the enzymatic capacity required to synthesize a set of desired target product metabolites from a predefined set of available substrates. Our method integrates a graph theoretic representation of network flow with the set cover problem in an integer linear programming (ILP) framework to simultaneously identify possible metabolic paths from substrates to products while minimizing the number of species required to catalyze these metabolic reactions. We apply our algorithm to successfully identify minimal communities both in a set of simple toy problems and in more complex, realistic settings, and to investigate metabolic capacities in the gut microbiome. Our framework adds to the growing toolset for supporting informed microbial community engineering and for ultimately realizing the full potential of such engineering efforts. Availability and implementation: The algorithm source code, compilation, usage instructions and examples are available under a non-commercial research use only license at https://github.com/borenstein-lab/CoMiDA. Contact: elbo@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153571

A similarity based agglomerative clustering algorithm in networks

NASA Astrophysics Data System (ADS)

Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

2018-04-01

The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.
Optimizing Algorithm Choice for Metaproteomics: Comparing X!Tandem and Proteome Discoverer for Soil Proteomes

NASA Astrophysics Data System (ADS)

Diaz, K. S.; Kim, E. H.; Jones, R. M.; de Leon, K. C.; Woodcroft, B. J.; Tyson, G. W.; Rich, V. I.

2014-12-01

The growing field of metaproteomics links microbial communities to their expressed functions by using mass spectrometry methods to characterize community proteins. Comparison of mass spectrometry protein search algorithms and their biases is crucial for maximizing the quality and amount of protein identifications in mass spectral data. Available algorithms employ different approaches when mapping mass spectra to peptides against a database. We compared mass spectra from four microbial proteomes derived from high-organic content soils searched with two search algorithms: 1) Sequest HT as packaged within Proteome Discoverer (v.1.4) and 2) X!Tandem as packaged in TransProteomicPipeline (v.4.7.1). Searches used matched metagenomes, and results were filtered to allow identification of high probability proteins. There was little overlap in proteins identified by both algorithms, on average just ~24% of the total. However, when adjusted for spectral abundance, the overlap improved to ~70%. Proteome Discoverer generally outperformed X!Tandem, identifying an average of 12.5% more proteins than X!Tandem, with X!Tandem identifying more proteins only in the first two proteomes. For spectrally-adjusted results, the algorithms were similar, with X!Tandem marginally outperforming Proteome Discoverer by an average of ~4%. We then assessed differences in heat shock proteins (HSP) identification by the two algorithms by BLASTing identified proteins against the Heat Shock Protein Information Resource, because HSP hits typically account for the majority signal in proteomes, due to extraction protocols. Total HSP identifications for each of the 4 proteomes were approximately ~15%, ~11%, ~17%, and ~19%, with ~14% for total HSPs with redundancies removed. Of the ~15% average of proteins from the 4 proteomes identified as HSPs, ~10% of proteins and spectra were identified by both algorithms. On average, Proteome Discoverer identified ~9% more HSPs than X!Tandem.
A Shadowing Problem in the Detection of Overlapping Communities: Lifting the Resolution Limit through a Cascading Procedure

PubMed Central

Young, Jean-Gabriel; Allard, Antoine; Hébert-Dufresne, Laurent; Dubé, Louis J.

2015-01-01

Community detection is the process of assigning nodes and links in significant communities (e.g. clusters, function modules) and its development has led to a better understanding of complex networks. When applied to sizable networks, we argue that most detection algorithms correctly identify prominent communities, but fail to do so across multiple scales. As a result, a significant fraction of the network is left uncharted. We show that this problem stems from larger or denser communities overshadowing smaller or sparser ones, and that this effect accounts for most of the undetected communities and unassigned links. We propose a generic cascading approach to community detection that circumvents the problem. Using real and artificial network datasets with three widely used community detection algorithms, we show how a simple cascading procedure allows for the detection of the missing communities. This work highlights a new detection limit of community structure, and we hope that our approach can inspire better community detection algorithms. PMID:26461919
Uncovering the community structure in signed social networks based on greedy optimization

NASA Astrophysics Data System (ADS)

Chen, Yan; Yan, Jiaqi; Yang, Yu; Chen, Junhua

2017-05-01

The formality of signed relationships has been recently adopted in a lot of complicated systems. The relations among these entities are complicated and multifarious. We cannot indicate these relationships only by positive links, and signed networks have been becoming more and more universal in the study of social networks when community is being significant. In this paper, to identify communities in signed networks, we exploit a new greedy algorithm, taking signs and the density of these links into account. The main idea of the algorithm is the initial procedure of signed modularity and the corresponding update rules. Specially, we employ the “Asymmetric and Constrained Belief Evolution” procedure to evaluate the optimal number of communities. According to the experimental results, the algorithm is proved to be able to run well. More specifically, the proposed algorithm is very efficient for these networks with medium size, both dense and sparse.
A novel community detection method in bipartite networks

NASA Astrophysics Data System (ADS)

Zhou, Cangqi; Feng, Liang; Zhao, Qianchuan

2018-02-01

Community structure is a common and important feature in many complex networks, including bipartite networks, which are used as a standard model for many empirical networks comprised of two types of nodes. In this paper, we propose a two-stage method for detecting community structure in bipartite networks. Firstly, we extend the widely-used Louvain algorithm to bipartite networks. The effectiveness and efficiency of the Louvain algorithm have been proved by many applications. However, there lacks a Louvain-like algorithm specially modified for bipartite networks. Based on bipartite modularity, a measure that extends unipartite modularity and that quantifies the strength of partitions in bipartite networks, we fill the gap by developing the Bi-Louvain algorithm that iteratively groups the nodes in each part by turns. This algorithm in bipartite networks often produces a balanced network structure with equal numbers of two types of nodes. Secondly, for the balanced network yielded by the first algorithm, we use an agglomerative clustering method to further cluster the network. We demonstrate that the calculation of the gain of modularity of each aggregation, and the operation of joining two communities can be compactly calculated by matrix operations for all pairs of communities simultaneously. At last, a complete hierarchical community structure is unfolded. We apply our method to two benchmark data sets and a large-scale data set from an e-commerce company, showing that it effectively identifies community structure in bipartite networks.
Information dynamics algorithm for detecting communities in networks

NASA Astrophysics Data System (ADS)

Massaro, Emanuele; Bagnoli, Franco; Guazzini, Andrea; Lió, Pietro

2012-11-01

The problem of community detection is relevant in many scientific disciplines, from social science to statistical physics. Given the impact of community detection in many areas, such as psychology and social sciences, we have addressed the issue of modifying existing well performing algorithms by incorporating elements of the domain application fields, i.e. domain-inspired. We have focused on a psychology and social network-inspired approach which may be useful for further strengthening the link between social network studies and mathematics of community detection. Here we introduce a community-detection algorithm derived from the van Dongen's Markov Cluster algorithm (MCL) method [4] by considering networks' nodes as agents capable to take decisions. In this framework we have introduced a memory factor to mimic a typical human behavior such as the oblivion effect. The method is based on information diffusion and it includes a non-linear processing phase. We test our method on two classical community benchmark and on computer generated networks with known community structure. Our approach has three important features: the capacity of detecting overlapping communities, the capability of identifying communities from an individual point of view and the fine tuning the community detectability with respect to prior knowledge of the data. Finally we discuss how to use a Shannon entropy measure for parameter estimation in complex networks.
Improving resolution of dynamic communities in human brain networks through targeted node removal

PubMed Central

Turner, Benjamin O.; Miller, Michael B.; Carlson, Jean M.

2017-01-01

Current approaches to dynamic community detection in complex networks can fail to identify multi-scale community structure, or to resolve key features of community dynamics. We propose a targeted node removal technique to improve the resolution of community detection. Using synthetic oscillator networks with well-defined “ground truth” communities, we quantify the community detection performance of a common modularity maximization algorithm. We show that the performance of the algorithm on communities of a given size deteriorates when these communities are embedded in multi-scale networks with communities of different sizes, compared to the performance in a single-scale network. We demonstrate that targeted node removal during community detection improves performance on multi-scale networks, particularly when removing the most functionally cohesive nodes. Applying this approach to network neuroscience, we compare dynamic functional brain networks derived from fMRI data taken during both repetitive single-task and varied multi-task experiments. After the removal of regions in visual cortex, the most coherent functional brain area during the tasks, community detection is better able to resolve known functional brain systems into communities. In addition, node removal enables the algorithm to distinguish clear differences in brain network dynamics between these experiments, revealing task-switching behavior that was not identified with the visual regions present in the network. These results indicate that targeted node removal can improve spatial and temporal resolution in community detection, and they demonstrate a promising approach for comparison of network dynamics between neuroscientific data sets with different resolution parameters. PMID:29261662
Using standard clinical assessments for home care to identify vulnerable populations before, during, and after disasters.

PubMed

van Solm, Alexandra I T; Hirdes, John P; Eckel, Leslie A; Heckman, George A; Bigelow, Philip L

Several studies have shown the increased vulnerability of and disproportionate mortality rate among frail community-dwelling older adults as a result of emergencies and disasters. This article will discuss the applicability of the Vulnerable Persons at Risk (VPR) and VPR Plus decision support algorithms designed based on the Resident Assessment Instrument-Home Care (RAI-HC) to identify the most vulnerable community-dwelling (older) adults. A sample was taken from the Ontario RAI-HC database by selecting unique home care clients with assessments closest to December 31, 2014 (N = 275,797). Statistical methods used include cross tabulation, bivariate logistic regression as well as Kaplan-Meier survival plotting and Cox proportional hazards ratios calculations. The VPR and VPR Plus algorithms, were highly predictive of mortality, long-term care admission and hospitalization in ordinary circumstances. This provides a good indication of the strength of the algorithms in identifying vulnerable persons at times of emergencies. Access to real-time person-level information of persons with functional care needs is a vital enabler for emergency responders in prioritizing and allocating resources during a disaster, and has great utility for emergency planning and recovery efforts. The development of valid and reliable algorithms supports the rapid identification and response to vulnerable community-dwelling persons for all phases of emergency management.
Fast detection of the fuzzy communities based on leader-driven algorithm

NASA Astrophysics Data System (ADS)

Fang, Changjian; Mu, Dejun; Deng, Zhenghong; Hu, Jun; Yi, Chen-He

2018-03-01

In this paper, we present the leader-driven algorithm (LDA) for learning community structure in networks. The algorithm allows one to find overlapping clusters in a network, an important aspect of real networks, especially social networks. The algorithm requires no input parameters and learns the number of clusters naturally from the network. It accomplishes this using leadership centrality in a clever manner. It identifies local minima of leadership centrality as followers which belong only to one cluster, and the remaining nodes are leaders which connect clusters. In this way, the number of clusters can be learned using only the network structure. The LDA is also an extremely fast algorithm, having runtime linear in the network size. Thus, this algorithm can be used to efficiently cluster extremely large networks.
Overlapping communities from dense disjoint and high total degree clusters

NASA Astrophysics Data System (ADS)

Zhang, Hongli; Gao, Yang; Zhang, Yue

2018-04-01

Community plays an important role in the field of sociology, biology and especially in domains of computer science, where systems are often represented as networks. And community detection is of great importance in the domains. A community is a dense subgraph of the whole graph with more links between its members than between its members to the outside nodes, and nodes in the same community probably share common properties or play similar roles in the graph. Communities overlap when nodes in a graph belong to multiple communities. A vast variety of overlapping community detection methods have been proposed in the literature, and the local expansion method is one of the most successful techniques dealing with large networks. The paper presents a density-based seeding method, in which dense disjoint local clusters are searched and selected as seeds. The proposed method selects a seed by the total degree and density of local clusters utilizing merely local structures of the network. Furthermore, this paper proposes a novel community refining phase via minimizing the conductance of each community, through which the quality of identified communities is largely improved in linear time. Experimental results in synthetic networks show that the proposed seeding method outperforms other seeding methods in the state of the art and the proposed refining method largely enhances the quality of the identified communities. Experimental results in real graphs with ground-truth communities show that the proposed approach outperforms other state of the art overlapping community detection algorithms, in particular, it is more than two orders of magnitude faster than the existing global algorithms with higher quality, and it obtains much more accurate community structure than the current local algorithms without any priori information.
Discrete particle swarm optimization for identifying community structures in signed social networks.

PubMed

Cai, Qing; Gong, Maoguo; Shen, Bo; Ma, Lijia; Jiao, Licheng

2014-10-01

Modern science of networks has facilitated us with enormous convenience to the understanding of complex systems. Community structure is believed to be one of the notable features of complex networks representing real complicated systems. Very often, uncovering community structures in networks can be regarded as an optimization problem, thus, many evolutionary algorithms based approaches have been put forward. Particle swarm optimization (PSO) is an artificial intelligent algorithm originated from social behavior such as birds flocking and fish schooling. PSO has been proved to be an effective optimization technique. However, PSO was originally designed for continuous optimization which confounds its applications to discrete contexts. In this paper, a novel discrete PSO algorithm is suggested for identifying community structures in signed networks. In the suggested method, particles' status has been redesigned in discrete form so as to make PSO proper for discrete scenarios, and particles' updating rules have been reformulated by making use of the topology of the signed network. Extensive experiments compared with three state-of-the-art approaches on both synthetic and real-world signed networks demonstrate that the proposed method is effective and promising. Copyright © 2014 Elsevier Ltd. All rights reserved.
Dynamics and control of diseases in networks with community structure.

PubMed

Salathé, Marcel; Jones, James H

2010-04-08

The dynamics of infectious diseases spread via direct person-to-person transmission (such as influenza, smallpox, HIV/AIDS, etc.) depends on the underlying host contact network. Human contact networks exhibit strong community structure. Understanding how such community structure affects epidemics may provide insights for preventing the spread of disease between communities by changing the structure of the contact network through pharmaceutical or non-pharmaceutical interventions. We use empirical and simulated networks to investigate the spread of disease in networks with community structure. We find that community structure has a major impact on disease dynamics, and we show that in networks with strong community structure, immunization interventions targeted at individuals bridging communities are more effective than those simply targeting highly connected individuals. Because the structure of relevant contact networks is generally not known, and vaccine supply is often limited, there is great need for efficient vaccination algorithms that do not require full knowledge of the network. We developed an algorithm that acts only on locally available network information and is able to quickly identify targets for successful immunization intervention. The algorithm generally outperforms existing algorithms when vaccine supply is limited, particularly in networks with strong community structure. Understanding the spread of infectious diseases and designing optimal control strategies is a major goal of public health. Social networks show marked patterns of community structure, and our results, based on empirical and simulated data, demonstrate that community structure strongly affects disease dynamics. These results have implications for the design of control strategies.
Detectability Thresholds and Optimal Algorithms for Community Structure in Dynamic Networks

NASA Astrophysics Data System (ADS)

Ghasemian, Amir; Zhang, Pan; Clauset, Aaron; Moore, Cristopher; Peel, Leto

2016-07-01

The detection of communities within a dynamic network is a common means for obtaining a coarse-grained view of a complex system and for investigating its underlying processes. While a number of methods have been proposed in the machine learning and physics literature, we lack a theoretical analysis of their strengths and weaknesses, or of the ultimate limits on when communities can be detected. Here, we study the fundamental limits of detecting community structure in dynamic networks. Specifically, we analyze the limits of detectability for a dynamic stochastic block model where nodes change their community memberships over time, but where edges are generated independently at each time step. Using the cavity method, we derive a precise detectability threshold as a function of the rate of change and the strength of the communities. Below this sharp threshold, we claim that no efficient algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this threshold. The first uses belief propagation, which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the belief propagation equations. These results extend our understanding of the limits of community detection in an important direction, and introduce new mathematical tools for similar extensions to networks with other types of auxiliary information.
Overlapping community detection based on link graph using distance dynamics

NASA Astrophysics Data System (ADS)

Chen, Lei; Zhang, Jing; Cai, Li-Jun

2018-01-01

The distance dynamics model was recently proposed to detect the disjoint community of a complex network. To identify the overlapping structure of a network using the distance dynamics model, an overlapping community detection algorithm, called L-Attractor, is proposed in this paper. The process of L-Attractor mainly consists of three phases. In the first phase, L-Attractor transforms the original graph to a link graph (a new edge graph) to assure that one node has multiple distances. In the second phase, using the improved distance dynamics model, a dynamic interaction process is introduced to simulate the distance dynamics (shrink or stretch). Through the dynamic interaction process, all distances converge, and the disjoint community structure of the link graph naturally manifests itself. In the third phase, a recovery method is designed to convert the disjoint community structure of the link graph to the overlapping community structure of the original graph. Extensive experiments are conducted on the LFR benchmark networks as well as real-world networks. Based on the results, our algorithm demonstrates higher accuracy and quality than other state-of-the-art algorithms.
Decoding communities in networks

NASA Astrophysics Data System (ADS)

Radicchi, Filippo

2018-02-01

According to a recent information-theoretical proposal, the problem of defining and identifying communities in networks can be interpreted as a classical communication task over a noisy channel: memberships of nodes are information bits erased by the channel, edges and nonedges in the network are parity bits introduced by the encoder but degraded through the channel, and a community identification algorithm is a decoder. The interpretation is perfectly equivalent to the one at the basis of well-known statistical inference algorithms for community detection. The only difference in the interpretation is that a noisy channel replaces a stochastic network model. However, the different perspective gives the opportunity to take advantage of the rich set of tools of coding theory to generate novel insights on the problem of community detection. In this paper, we illustrate two main applications of standard coding-theoretical methods to community detection. First, we leverage a state-of-the-art decoding technique to generate a family of quasioptimal community detection algorithms. Second and more important, we show that the Shannon's noisy-channel coding theorem can be invoked to establish a lower bound, here named as decodability bound, for the maximum amount of noise tolerable by an ideal decoder to achieve perfect detection of communities. When computed for well-established synthetic benchmarks, the decodability bound explains accurately the performance achieved by the best community detection algorithms existing on the market, telling us that only little room for their improvement is still potentially left.
Decoding communities in networks.

PubMed

Radicchi, Filippo

2018-02-01

According to a recent information-theoretical proposal, the problem of defining and identifying communities in networks can be interpreted as a classical communication task over a noisy channel: memberships of nodes are information bits erased by the channel, edges and nonedges in the network are parity bits introduced by the encoder but degraded through the channel, and a community identification algorithm is a decoder. The interpretation is perfectly equivalent to the one at the basis of well-known statistical inference algorithms for community detection. The only difference in the interpretation is that a noisy channel replaces a stochastic network model. However, the different perspective gives the opportunity to take advantage of the rich set of tools of coding theory to generate novel insights on the problem of community detection. In this paper, we illustrate two main applications of standard coding-theoretical methods to community detection. First, we leverage a state-of-the-art decoding technique to generate a family of quasioptimal community detection algorithms. Second and more important, we show that the Shannon's noisy-channel coding theorem can be invoked to establish a lower bound, here named as decodability bound, for the maximum amount of noise tolerable by an ideal decoder to achieve perfect detection of communities. When computed for well-established synthetic benchmarks, the decodability bound explains accurately the performance achieved by the best community detection algorithms existing on the market, telling us that only little room for their improvement is still potentially left.
Detecting brain dynamics during resting state: a tensor based evolutionary clustering approach

NASA Astrophysics Data System (ADS)

Al-sharoa, Esraa; Al-khassaweneh, Mahmood; Aviyente, Selin

2017-08-01

Human brain is a complex network with connections across different regions. Understanding the functional connectivity (FC) of the brain is important both during resting state and task; as disruptions in connectivity patterns are indicators of different psychopathological and neurological diseases. In this work, we study the resting state functional connectivity networks (FCNs) of the brain from fMRI BOLD signals. Recent studies have shown that FCNs are dynamic even during resting state and understanding the temporal dynamics of FCNs is important for differentiating between different conditions. Therefore, it is important to develop algorithms to track the dynamic formation and dissociation of FCNs of the brain during resting state. In this paper, we propose a two step tensor based community detection algorithm to identify and track the brain network community structure across time. First, we introduce an information-theoretic function to reduce the dynamic FCN and identify the time points that are similar topologically to combine them into a tensor. These time points will be used to identify the different FC states. Second, a tensor based spectral clustering approach is developed to identify the community structure of the constructed tensors. The proposed algorithm applies Tucker decomposition to the constructed tensors and extract the orthogonal factor matrices along the connectivity mode to determine the common subspace within each FC state. The detected community structure is summarized and described as FC states. The results illustrate the dynamic structure of resting state networks (RSNs), including the default mode network, somatomotor network, subcortical network and visual network.
Network community structure and loop coefficient method

NASA Astrophysics Data System (ADS)

Vragović, I.; Louis, E.

2006-07-01

A modular structure, in which groups of tightly connected nodes could be resolved as separate entities, is a property that can be found in many complex networks. In this paper, we propose a algorithm for identifying communities in networks. It is based on a local measure, so-called loop coefficient that is a generalization of the clustering coefficient. Nodes with a large loop coefficient tend to be core inner community nodes, while other vertices are usually peripheral sites at the borders of communities. Our method gives satisfactory results for both artificial and real-world graphs, if they have a relatively pronounced modular structure. This type of algorithm could open a way of interpreting the role of nodes in communities in terms of the local loop coefficient, and could be used as a complement to other methods.
A simple scoring algorithm predicting extended-spectrum β-lactamase producers in adults with community-onset monomicrobial Enterobacteriaceae bacteremia: Matters of frequent emergency department users.

PubMed

Lee, Chung-Hsun; Chu, Feng-Yuan; Hsieh, Chih-Chia; Hong, Ming-Yuan; Chi, Chih-Hsien; Ko, Wen-Chien; Lee, Ching-Chi

2017-04-01

The incidence of community-onset bacteremia caused by extended-spectrum-β-lactamase (ESBL) producers is increasing. The adverse effects of ESBL production on patient outcome have been recognized and this antimicrobial resistance has significant implications in the delay of appropriate therapy. However, a simple scoring algorithm that can easily, inexpensively, and accurately be applied to clinical settings was lacking. Thus, we established a predictive scoring algorithm for identifying patients at the risk of ESBL-producer infections among patients with community-onset monomicrobial Enterobacteriaceae bacteremia (CoMEB).In a retrospective cohort, multicenter study, adults with CoMEB in the emergency department (ED) were recruited during January 2008 to December 2013. ESBL producers were determined based on ESBL phenotype. Clinical information was obtained from chart records.Of the total 1141 adults with CoMEB, 65 (5.7%) caused by ESBL producers were identified. Four independent multivariate predictors of ESBL-producer bacteremia with high odds ratios (ORs)-recent antimicrobial use (OR, 15.29), recent invasive procedures (OR, 12.33), nursing home residents (OR, 27.77), and frequent ED user (OR, 9.98)-were each assigned +1 point to obtain the CoMEB-ESBL score. Using the proposed scoring algorithm, a cut-off value of +2 yielded a high sensitivity (84.6%) and an acceptable specificity (92.5%); the area under the receiver operating characteristic curve was 0.92.In conclusion, this simple scoring algorithm can be used to identify CoMEB patients with a high ESBL-producer infection risk. Of note, frequent ED user was firstly demonstrated to be a crucial predictor in predicting ESBL-producer infections. ED clinicians should consider adequate empirical therapy with coverage of these pathogens for patients with risk factors.
Detections of Propellers in Saturn's Rings using Machine Learning: Preliminary Results

NASA Astrophysics Data System (ADS)

Gordon, Mitchell K.; Showalter, Mark R.; Odess, Jennifer; Del Villar, Ambi; LaMora, Andy; Paik, Jin; Lakhani, Karim; Sergeev, Rinat; Erickson, Kristen; Galica, Carol; Grayzeck, Edwin; Morgan, Thomas; Knopf, William

2015-11-01

We report on the initial analysis of the output of a tool designed to identify persistent, non-axisymmetric features in the rings of Saturn. This project introduces a new paradigm for scientific software development. The preliminary results include what appear to be new detections of propellers in the rings of Saturn.The Planetary Data System (PDS), working with the NASA Tournament Lab (NTL), Crowd Innovation Lab at Harvard University, and the Topcoder community at Appirio, Inc., under the umbrella “Cassini Rings Challenge”, sponsored a set of competitions employing crowd sourcing and machine learning to develop a tool which could be made available to the community at large. The Challenge was tackled by running a series of separate contests to solve individual tasks prior to the major machine learning challenge. Each contest was comprised of a set of requirements, a timeline, one or more prizes, and other incentives, and was posted by Appirio to the Topcoder Community. In the case of the machine learning challenge (a “Marathon Challenge” on the Topcoder platform), members competed against each other by submitting solutions that were scored in real time and posted to a public leader-board by a scoring algorithm developed by Appirio for this contest.The current version of the algorithm was run against ~30,000 of the highest resolution Cassini ISS images. That set included 668 images with a total of 786 features previously identified as propellers in the main rings. The tool identified 81% of those previously identified propellers. In a preliminary, close examination of 130 detections identified by the tool, we determined that of the 130 detections, 11 were previously identified propeller detections, 5 appear to be new detections of known propellers, and 4 appear to be detections of propellers which have not been seen previously. A total of 20 valid detections from 130 candidates implies a relatively high false positive rate which we hope to reduce by further algorithm development. The machine learning aspect of the algorithm means that as our set of verified detections increases so does the pool of “ground-truth” data used to train the algorithm for future use.

Application of a fall screening algorithm stratified fall risk but missed preventive opportunities in community-dwelling older adults: a prospective study.

PubMed

Muir, Susan W; Berg, Katherine; Chesworth, Bert; Klar, Neil; Speechley, Mark

2010-01-01

Evaluate the ability of the American and British Geriatrics Society fall prevention guideline's screening algorithm to identify and stratify future fall risk in community-dwelling older adults. Prospective cohort of community-dwelling older adults (n = 117) aged 65 to 90 years. Fall history, balance, and gait measured during a comprehensive geriatric assessment at baseline. Falls data were collected monthly for 1 year. The outcomes of any fall and any injurious fall were evaluated. The algorithm stratified participants into 4 hierarchal risk categories. Fall risk was 33% and 68% for the "no intervention" and "comprehensive fall evaluation required" groups respectively. The relative risk estimate for falling comparing participants in the 2 intervention groups was 2.08 (95% CI 1.42-3.05) for any fall and 2.60 (95% Cl 1.53-4.42) for any injurious fall. Prognostic accuracy values were: sensitivity of 0.50 (95% Cl 0.36-0.64) and specificity of 0.82 (95% CI 0.70-0.90) for any fall; and sensitivity of 0.56 (95% CI 0.38-0.72) and specificity of 0.78 (95% Cl 0.67-0.86) for any injurious fall. The algorithm was able to identify and stratify fall risk for each fall outcome, though the values of prognostic accuracy demonstrate moderate clinical utility. The recommendations of fall evaluation for individuals in the highest risk groups appear supported though the recommendation of no intervention in the lowest risk groups may not address their needs for fall prevention interventions. Further evaluation of the algorithm is recommended to refine the identification of fall risk in community-dwelling older adults.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jason L. Wright

Finding and identifying Cryptography is a growing concern in the malware analysis community. In this paper, a heuristic method for determining the likelihood that a given function contains a cryptographic algorithm is discussed and the results of applying this method in various environments is shown. The algorithm is based on frequency analysis of opcodes that make up each function within a binary.
Identifying and characterizing key nodes among communities based on electrical-circuit networks.

PubMed

Zhu, Fenghui; Wang, Wenxu; Di, Zengru; Fan, Ying

2014-01-01

Complex networks with community structures are ubiquitous in the real world. Despite many approaches developed for detecting communities, we continue to lack tools for identifying overlapping and bridging nodes that play crucial roles in the interactions and communications among communities in complex networks. Here we develop an algorithm based on the local flow conservation to effectively and efficiently identify and distinguish the two types of nodes. Our method is applicable in both undirected and directed networks without a priori knowledge of the community structure. Our method bypasses the extremely challenging problem of partitioning communities in the presence of overlapping nodes that may belong to multiple communities. Due to the fact that overlapping and bridging nodes are of paramount importance in maintaining the function of many social and biological networks, our tools open new avenues towards understanding and controlling real complex networks with communities accompanied with the key nodes.
Application of Recursive Partitioning to Derive and Validate a Claims-Based Algorithm for Identifying Keratinocyte Carcinoma (Nonmelanoma Skin Cancer).

PubMed

Chan, An-Wen; Fung, Kinwah; Tran, Jennifer M; Kitchen, Jessica; Austin, Peter C; Weinstock, Martin A; Rochon, Paula A

2016-10-01

Keratinocyte carcinoma (nonmelanoma skin cancer) accounts for substantial burden in terms of high incidence and health care costs but is excluded by most cancer registries in North America. Administrative health insurance claims databases offer an opportunity to identify these cancers using diagnosis and procedural codes submitted for reimbursement purposes. To apply recursive partitioning to derive and validate a claims-based algorithm for identifying keratinocyte carcinoma with high sensitivity and specificity. Retrospective study using population-based administrative databases linked to 602 371 pathology episodes from a community laboratory for adults residing in Ontario, Canada, from January 1, 1992, to December 31, 2009. The final analysis was completed in January 2016. We used recursive partitioning (classification trees) to derive an algorithm based on health insurance claims. The performance of the derived algorithm was compared with 5 prespecified algorithms and validated using an independent academic hospital clinic data set of 2082 patients seen in May and June 2011. Sensitivity, specificity, positive predictive value, and negative predictive value using the histopathological diagnosis as the criterion standard. We aimed to achieve maximal specificity, while maintaining greater than 80% sensitivity. Among 602 371 pathology episodes, 131 562 (21.8%) had a diagnosis of keratinocyte carcinoma. Our final derived algorithm outperformed the 5 simple prespecified algorithms and performed well in both community and hospital data sets in terms of sensitivity (82.6% and 84.9%, respectively), specificity (93.0% and 99.0%, respectively), positive predictive value (76.7% and 69.2%, respectively), and negative predictive value (95.0% and 99.6%, respectively). Algorithm performance did not vary substantially during the 18-year period. This algorithm offers a reliable mechanism for ascertaining keratinocyte carcinoma for epidemiological research in the absence of cancer registry data. Our findings also demonstrate the value of recursive partitioning in deriving valid claims-based algorithms.
Purpose-Driven Communities in Multiplex Networks: Thresholding User-Engaged Layer Aggregation

DTIC Science & Technology

2016-06-01

dark networks is a non-trivial yet useful task. Because terrorists work hard to hide their relationships/network, analysts have an incomplete picture...them identify meaningful terrorist communities. This thesis introduces a general-purpose algorithm for community detection in multiplex dark networks...aggregation, dark networks, conductance, cluster adequacy, mod- ularity, Louvain method, shortest path interdiction 15. NUMBER OF PAGES 155 16. PRICE CODE
Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample.

PubMed

Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

2016-01-01

Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.
Triggering Interventions for Influenza: The ALERT Algorithm

PubMed Central

Reich, Nicholas G.; Cummings, Derek A. T.; Lauer, Stephen A.; Zorn, Martha; Robinson, Christine; Nyquist, Ann-Christine; Price, Connie S.; Simberkoff, Michael; Radonovich, Lewis J.; Perl, Trish M.

2015-01-01

Background. Early, accurate predictions of the onset of influenza season enable targeted implementation of control efforts. Our objective was to develop a tool to assist public health practitioners, researchers, and clinicians in defining the community-level onset of seasonal influenza epidemics. Methods. Using recent surveillance data on virologically confirmed infections of influenza, we developed the Above Local Elevated Respiratory Illness Threshold (ALERT) algorithm, a method to identify the period of highest seasonal influenza activity. We used data from 2 large hospitals that serve Baltimore, Maryland and Denver, Colorado, and the surrounding geographic areas. The data used by ALERT are routinely collected surveillance data: weekly case counts of laboratory-confirmed influenza A virus. The main outcome is the percentage of prospective seasonal influenza cases identified by the ALERT algorithm. Results. When ALERT thresholds designed to capture 90% of all cases were applied prospectively to the 2011–2012 and 2012–2013 influenza seasons in both hospitals, 71%–91% of all reported cases fell within the ALERT period. Conclusions. The ALERT algorithm provides a simple, robust, and accurate metric for determining the onset of elevated influenza activity at the community level. This new algorithm provides valuable information that can impact infection prevention recommendations, public health practice, and healthcare delivery. PMID:25414260
Predictors of treatment failure for non-severe childhood pneumonia in developing countries--systematic literature review and expert survey--the first step towards a community focused mHealth risk-assessment tool?

PubMed

McCollum, Eric D; King, Carina; Hollowell, Robert; Zhou, Janet; Colbourn, Tim; Nambiar, Bejoy; Mukanga, David; Burgess, Deborah C Hay

2015-07-09

Improved referral algorithms for children with non-severe pneumonia at the community level are desirable. We sought to identify predictors of oral antibiotic failure in children who fulfill the case definition of World Health Organization (WHO) non-severe pneumonia. Predictors of greatest interest were those not currently utilized in referral algorithms and feasible to obtain at the community level. We systematically reviewed prospective studies reporting independent predictors of oral antibiotic failure for children 2-59 months of age in resource-limited settings with WHO non-severe pneumonia (either fast breathing for age and/or lower chest wall indrawing without danger signs), with an emphasis on predictors not currently utilized for referral and reasonable for community health workers. We searched PubMed, Cochrane, and Embase and qualitatively analyzed publications from 1997-2014. To supplement the limited published evidence in this subject area we also surveyed respiratory experts. Nine studies met criteria, seven of which were performed in south Asia. One eligible study occurred exclusively at the community level. Overall, oral antibiotic failure rates ranged between 7.8-22.9%. Six studies found excess age-adjusted respiratory rate (either WHO-defined very fast breathing for age or 10-15 breaths/min faster than normal WHO age-adjusted thresholds) and four reported young age as predictive for oral antibiotic failure. Of the seven predictors identified by the expert panel, abnormal oxygen saturation and malnutrition were most highly favored per the panel's rankings and comments. This review identified several candidate predictors of oral antibiotic failure not currently utilized in childhood pneumonia referral algorithms; excess age-specific respiratory rate, young age, abnormal oxygen saturation, and moderate malnutrition. However, the data was limited and there are clear evidence gaps; research in rural, low-resource settings with community health workers is needed.
Administrative Data Algorithms Can Describe Ambulatory Physician Utilization

PubMed Central

Shah, Baiju R; Hux, Janet E; Laupacis, Andreas; Zinman, Bernard; Cauch-Dudek, Karen; Booth, Gillian L

2007-01-01

Objective To validate algorithms using administrative data that characterize ambulatory physician care for patients with a chronic disease. Data Sources Seven-hundred and eighty-one people with diabetes were recruited mostly from community pharmacies to complete a written questionnaire about their physician utilization in 2002. These data were linked with administrative databases detailing health service utilization. Study Design An administrative data algorithm was defined that identified whether or not patients received specialist care, and it was tested for agreement with self-report. Other algorithms, which assigned each patient to a primary care and specialist physician, were tested for concordance with self-reported regular providers of care. Principal Findings The algorithm to identify whether participants received specialist care had 80.4 percent agreement with questionnaire responses (κ = 0.59). Compared with self-report, administrative data had a sensitivity of 68.9 percent and specificity 88.3 percent for identifying specialist care. The best administrative data algorithm to assign each participant's regular primary care and specialist providers was concordant with self-report in 82.6 and 78.2 percent of cases, respectively. Conclusions Administrative data algorithms can accurately match self-reported ambulatory physician utilization. PMID:17610448
Bayesian Community Detection in the Space of Group-Level Functional Differences

PubMed Central

Venkataraman, Archana; Yang, Daniel Y.-J.; Pelphrey, Kevin A.; Duncan, James S.

2017-01-01

We propose a unified Bayesian framework to detect both hyper- and hypo-active communities within whole-brain fMRI data. Specifically, our model identifies dense subgraphs that exhibit population-level differences in functional synchrony between a control and clinical group. We derive a variational EM algorithm to solve for the latent posterior distributions and parameter estimates, which subsequently inform us about the afflicted network topology. We demonstrate that our method provides valuable insights into the neural mechanisms underlying social dysfunction in autism, as verified by the Neurosynth meta-analytic database. In contrast, both univariate testing and community detection via recursive edge elimination fail to identify stable functional communities associated with the disorder. PMID:26955022
Bayesian Community Detection in the Space of Group-Level Functional Differences.

PubMed

Venkataraman, Archana; Yang, Daniel Y-J; Pelphrey, Kevin A; Duncan, James S

2016-08-01

We propose a unified Bayesian framework to detect both hyper- and hypo-active communities within whole-brain fMRI data. Specifically, our model identifies dense subgraphs that exhibit population-level differences in functional synchrony between a control and clinical group. We derive a variational EM algorithm to solve for the latent posterior distributions and parameter estimates, which subsequently inform us about the afflicted network topology. We demonstrate that our method provides valuable insights into the neural mechanisms underlying social dysfunction in autism, as verified by the Neurosynth meta-analytic database. In contrast, both univariate testing and community detection via recursive edge elimination fail to identify stable functional communities associated with the disorder.
Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms.

PubMed

Papež, Václav; Denaxas, Spiros; Hemingway, Harry

2017-01-01

Electronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured, semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses, prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority of algorithms are stored as human-readable descriptive text documents making their translation to code challenging due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate the two key Semantic Web Technologies, the Web Ontology Language and the Resource Description Framework, for enabling computable representations of EHR-driven phenotyping algorithms.
Development and Validation of an Algorithm to Identify Planned Readmissions From Claims Data.

PubMed

Horwitz, Leora I; Grady, Jacqueline N; Cohen, Dorothy B; Lin, Zhenqiu; Volpe, Mark; Ngo, Chi K; Masica, Andrew L; Long, Theodore; Wang, Jessica; Keenan, Megan; Montague, Julia; Suter, Lisa G; Ross, Joseph S; Drye, Elizabeth E; Krumholz, Harlan M; Bernheim, Susannah M

2015-10-01

It is desirable not to include planned readmissions in readmission measures because they represent deliberate, scheduled care. To develop an algorithm to identify planned readmissions, describe its performance characteristics, and identify improvements. Consensus-driven algorithm development and chart review validation study at 7 acute-care hospitals in 2 health systems. For development, all discharges qualifying for the publicly reported hospital-wide readmission measure. For validation, all qualifying same-hospital readmissions that were characterized by the algorithm as planned, and a random sampling of same-hospital readmissions that were characterized as unplanned. We calculated weighted sensitivity and specificity, and positive and negative predictive values of the algorithm (version 2.1), compared to gold standard chart review. In consultation with 27 experts, we developed an algorithm that characterizes 7.8% of readmissions as planned. For validation we reviewed 634 readmissions. The weighted sensitivity of the algorithm was 45.1% overall, 50.9% in large teaching centers and 40.2% in smaller community hospitals. The weighted specificity was 95.9%, positive predictive value was 51.6%, and negative predictive value was 94.7%. We identified 4 minor changes to improve algorithm performance. The revised algorithm had a weighted sensitivity 49.8% (57.1% at large hospitals), weighted specificity 96.5%, positive predictive value 58.7%, and negative predictive value 94.5%. Positive predictive value was poor for the 2 most common potentially planned procedures: diagnostic cardiac catheterization (25%) and procedures involving cardiac devices (33%). An administrative claims-based algorithm to identify planned readmissions is feasible and can facilitate public reporting of primarily unplanned readmissions. © 2015 Society of Hospital Medicine.
Derivation and validation of the Personal Support Algorithm: an evidence-based framework to inform allocation of personal support services in home and community care.

PubMed

Sinn, Chi-Ling Joanna; Jones, Aaron; McMullan, Janet Legge; Ackerman, Nancy; Curtin-Telegdi, Nancy; Eckel, Leslie; Hirdes, John P

2017-11-25

Personal support services enable many individuals to stay in their homes, but there are no standard ways to classify need for functional support in home and community care settings. The goal of this project was to develop an evidence-based clinical tool to inform service planning while allowing for flexibility in care coordinator judgment in response to patient and family circumstances. The sample included 128,169 Ontario home care patients assessed in 2013 and 25,800 Ontario community support clients assessed between 2014 and 2016. Independent variables were drawn from the Resident Assessment Instrument-Home Care and interRAI Community Health Assessment that are standardised, comprehensive, and fully compatible clinical assessments. Clinical expertise and regression analyses identified candidate variables that were entered into decision tree models. The primary dependent variable was the weekly hours of personal support calculated based on the record of billed services. The Personal Support Algorithm classified need for personal support into six groups with a 32-fold difference in average billed hours of personal support services between the highest and lowest group. The algorithm explained 30.8% of the variability in billed personal support services. Care coordinators and managers reported that the guidelines based on the algorithm classification were consistent with their clinical judgment and current practice. The Personal Support Algorithm provides a structured yet flexible decision-support framework that may facilitate a more transparent and equitable approach to the allocation of personal support services.
Overlapping Community Detection based on Network Decomposition

NASA Astrophysics Data System (ADS)

Ding, Zhuanlian; Zhang, Xingyi; Sun, Dengdi; Luo, Bin

2016-04-01

Community detection in complex network has become a vital step to understand the structure and dynamics of networks in various fields. However, traditional node clustering and relatively new proposed link clustering methods have inherent drawbacks to discover overlapping communities. Node clustering is inadequate to capture the pervasive overlaps, while link clustering is often criticized due to the high computational cost and ambiguous definition of communities. So, overlapping community detection is still a formidable challenge. In this work, we propose a new overlapping community detection algorithm based on network decomposition, called NDOCD. Specifically, NDOCD iteratively splits the network by removing all links in derived link communities, which are identified by utilizing node clustering technique. The network decomposition contributes to reducing the computation time and noise link elimination conduces to improving the quality of obtained communities. Besides, we employ node clustering technique rather than link similarity measure to discover link communities, thus NDOCD avoids an ambiguous definition of community and becomes less time-consuming. We test our approach on both synthetic and real-world networks. Results demonstrate the superior performance of our approach both in computation time and accuracy compared to state-of-the-art algorithms.
Automated identification of patients with pulmonary nodules in an integrated health system using administrative health plan data, radiology reports, and natural language processing.

PubMed

Danforth, Kim N; Early, Megan I; Ngan, Sharon; Kosco, Anne E; Zheng, Chengyi; Gould, Michael K

2012-08-01

Lung nodules are commonly encountered in clinical practice, yet little is known about their management in community settings. An automated method for identifying patients with lung nodules would greatly facilitate research in this area. Using members of a large, community-based health plan from 2006 to 2010, we developed a method to identify patients with lung nodules, by combining five diagnostic codes, four procedural codes, and a natural language processing algorithm that performed free text searches of radiology transcripts. An experienced pulmonologist reviewed a random sample of 116 radiology transcripts, providing a reference standard for the natural language processing algorithm. With the use of an automated method, we identified 7112 unique members as having one or more incident lung nodules. The mean age of the patients was 65 years (standard deviation 14 years). There were slightly more women (54%) than men, and Hispanics and non-whites comprised 45% of the lung nodule cohort. Thirty-six percent were never smokers whereas 11% were current smokers. Fourteen percent of the patients were subsequently diagnosed with lung cancer. The sensitivity and specificity of the natural language processing algorithm for identifying the presence of lung nodules were 96% and 86%, respectively, compared with clinician review. Among the true positive transcripts in the validation sample, only 35% were solitary and unaccompanied by one or more associated findings, and 56% measured 8 to 30 mm in diameter. A combination of diagnostic codes, procedural codes, and a natural language processing algorithm for free text searching of radiology reports can accurately and efficiently identify patients with incident lung nodules, many of whom are subsequently diagnosed with lung cancer.
RCLUS, a new program for clustering associated species: A demonstration using a Mojave Desert plant community dataset

Treesearch

Stewart C. Sanderson; Jeffrey E. Ott; E. Durant McArthur; Kimball T. Harper

2006-01-01

This paper presents a new clustering program named RCLUS that was developed for species (R-mode) analysis of plant community data. RCLUS identifies clusters of co-occurring species that meet a user-specified cutoff level of positive association with each other. The "strict affinity" clustering algorithm in RCLUS builds clusters of species whose pairwise...
[Algorithms based on medico-administrative data in the field of endocrine, nutritional and metabolic diseases, especially diabetes].

PubMed

Fosse-Edorh, S; Rigou, A; Morin, S; Fezeu, L; Mandereau-Bruno, L; Fagot-Campagna, A

2017-10-01

Medico-administrative databases represent a very interesting source of information in the field of endocrine, nutritional and metabolic diseases. The objective of this article is to describe the early works of the Redsiam working group in this field. Algorithms developed in France in the field of diabetes, the treatment of dyslipidemia, precocious puberty, and bariatric surgery based on the National Inter-schema Information System on Health Insurance (SNIIRAM) data were identified and described. Three algorithms for identifying people with diabetes are available in France. These algorithms are based either on full insurance coverage for diabetes or on claims of diabetes treatments, or on the combination of these two methods associated with hospitalizations related to diabetes. Each of these algorithms has a different purpose, and the choice should depend on the goal of the study. Algorithms for identifying people treated for dyslipidemia or precocious puberty or who underwent bariatric surgery are also available. Early work from the Redsiam working group in the field of endocrine, nutritional and metabolic diseases produced an inventory of existing algorithms in France, linked with their goals, together with a presentation of their limitations and advantages, providing useful information for the scientific community. This work will continue with discussions about algorithms on the incidence of diabetes in children, thyroidectomy for thyroid nodules, hypothyroidism, hypoparathyroidism, and amyloidosis. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Scalable Static and Dynamic Community Detection Using Grappolo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman

Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less
Leveraging disjoint communities for detecting overlapping community structure

NASA Astrophysics Data System (ADS)

Chakraborty, Tanmoy

2015-05-01

Network communities represent mesoscopic structure for understanding the organization of real-world networks, where nodes often belong to multiple communities and form overlapping community structure in the network. Due to non-triviality in finding the exact boundary of such overlapping communities, this problem has become challenging, and therefore huge effort has been devoted to detect overlapping communities from the network. In this paper, we present PVOC (Permanence based Vertex-replication algorithm for Overlapping Community detection), a two-stage framework to detect overlapping community structure. We build on a novel observation that non-overlapping community structure detected by a standard disjoint community detection algorithm from a network has high resemblance with its actual overlapping community structure, except the overlapping part. Based on this observation, we posit that there is perhaps no need of building yet another overlapping community finding algorithm; but one can efficiently manipulate the output of any existing disjoint community finding algorithm to obtain the required overlapping structure. We propose a new post-processing technique that by combining with any existing disjoint community detection algorithm, can suitably process each vertex using a new vertex-based metric, called permanence, and thereby finds out overlapping candidates with their community memberships. Experimental results on both synthetic and large real-world networks show that PVOC significantly outperforms six state-of-the-art overlapping community detection algorithms in terms of high similarity of the output with the ground-truth structure. Thus our framework not only finds meaningful overlapping communities from the network, but also allows us to put an end to the constant effort of building yet another overlapping community detection algorithm.

Feasibility and cost-effectiveness of stroke prevention through community screening for atrial fibrillation using iPhone ECG in pharmacies. The SEARCH-AF study.

PubMed

Lowres, Nicole; Neubeck, Lis; Salkeld, Glenn; Krass, Ines; McLachlan, Andrew J; Redfern, Julie; Bennett, Alexandra A; Briffa, Tom; Bauman, Adrian; Martinez, Carlos; Wallenhorst, Christopher; Lau, Jerrett K; Brieger, David B; Sy, Raymond W; Freedman, S Ben

2014-06-01

Atrial fibrillation (AF) causes a third of all strokes, but often goes undetected before stroke. Identification of unknown AF in the community and subsequent anti-thrombotic treatment could reduce stroke burden. We investigated community screening for unknown AF using an iPhone electrocardiogram (iECG) in pharmacies, and determined the cost-effectiveness of this strategy.Pharmacists performedpulse palpation and iECG recordings, with cardiologist iECG over-reading. General practitioner review/12-lead ECG was facilitated for suspected new AF. An automated AF algorithm was retrospectively applied to collected iECGs. Cost-effectiveness analysis incorporated costs of iECG screening, and treatment/outcome data from a United Kingdom cohort of 5,555 patients with incidentally detected asymptomatic AF. A total of 1,000 pharmacy customers aged ≥65 years (mean 76 ± 7 years; 44% male) were screened. Newly identified AF was found in 1.5% (95% CI, 0.8-2.5%); mean age 79 ± 6 years; all had CHA2DS2-VASc score ≥2. AF prevalence was 6.7% (67/1,000). The automated iECG algorithm showed 98.5% (CI, 92-100%) sensitivity for AF detection and 91.4% (CI, 89-93%) specificity. The incremental cost-effectiveness ratio of extending iECG screening into the community, based on 55% warfarin prescription adherence, would be $AUD5,988 (€3,142; $USD4,066) per Quality Adjusted Life Year gained and $AUD30,481 (€15,993; $USD20,695) for preventing one stroke. Sensitivity analysis indicated cost-effectiveness improved with increased treatment adherence.Screening with iECG in pharmacies with an automated algorithm is both feasible and cost-effective. The high and largely preventable stroke/thromboembolism risk of those with newly identified AF highlights the likely benefits of community AF screening. Guideline recommendation of community iECG AF screening should be considered.
Decomposition-Based Multiobjective Evolutionary Algorithm for Community Detection in Dynamic Social Networks

PubMed Central

Ma, Jingjing; Liu, Jie; Ma, Wenping; Gong, Maoguo; Jiao, Licheng

2014-01-01

Community structure is one of the most important properties in social networks. In dynamic networks, there are two conflicting criteria that need to be considered. One is the snapshot quality, which evaluates the quality of the community partitions at the current time step. The other is the temporal cost, which evaluates the difference between communities at different time steps. In this paper, we propose a decomposition-based multiobjective community detection algorithm to simultaneously optimize these two objectives to reveal community structure and its evolution in dynamic networks. It employs the framework of multiobjective evolutionary algorithm based on decomposition to simultaneously optimize the modularity and normalized mutual information, which quantitatively measure the quality of the community partitions and temporal cost, respectively. A local search strategy dealing with the problem-specific knowledge is incorporated to improve the effectiveness of the new algorithm. Experiments on computer-generated and real-world networks demonstrate that the proposed algorithm can not only find community structure and capture community evolution more accurately, but also be steadier than the two compared algorithms. PMID:24723806
Decomposition-based multiobjective evolutionary algorithm for community detection in dynamic social networks.

PubMed

Ma, Jingjing; Liu, Jie; Ma, Wenping; Gong, Maoguo; Jiao, Licheng

2014-01-01

Community structure is one of the most important properties in social networks. In dynamic networks, there are two conflicting criteria that need to be considered. One is the snapshot quality, which evaluates the quality of the community partitions at the current time step. The other is the temporal cost, which evaluates the difference between communities at different time steps. In this paper, we propose a decomposition-based multiobjective community detection algorithm to simultaneously optimize these two objectives to reveal community structure and its evolution in dynamic networks. It employs the framework of multiobjective evolutionary algorithm based on decomposition to simultaneously optimize the modularity and normalized mutual information, which quantitatively measure the quality of the community partitions and temporal cost, respectively. A local search strategy dealing with the problem-specific knowledge is incorporated to improve the effectiveness of the new algorithm. Experiments on computer-generated and real-world networks demonstrate that the proposed algorithm can not only find community structure and capture community evolution more accurately, but also be steadier than the two compared algorithms.
Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

NASA Astrophysics Data System (ADS)

Ma, Xiaoke; Wang, Bingbo; Yu, Liang

2018-01-01

Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.
Locating Structural Centers: A Density-Based Clustering Method for Community Detection

PubMed Central

Liu, Gongshen; Li, Jianhua; Nees, Jan P.

2017-01-01

Uncovering underlying community structures in complex networks has received considerable attention because of its importance in understanding structural attributes and group characteristics of networks. The algorithmic identification of such structures is a significant challenge. Local expanding methods have proven to be efficient and effective in community detection, but most methods are sensitive to initial seeds and built-in parameters. In this paper, we present a local expansion method by density-based clustering, which aims to uncover the intrinsic network communities by locating the structural centers of communities based on a proposed structural centrality. The structural centrality takes into account local density of nodes and relative distance between nodes. The proposed algorithm expands a community from the structural center to the border with a single local search procedure. The local expanding procedure follows a heuristic strategy as allowing it to find complete community structures. Moreover, it can identify different node roles (cores and outliers) in communities by defining a border region. The experiments involve both on real-world and artificial networks, and give a comparison view to evaluate the proposed method. The result of these experiments shows that the proposed method performs more efficiently with a comparative clustering performance than current state of the art methods. PMID:28046030
Social significance of community structure: Statistical view

NASA Astrophysics Data System (ADS)

Li, Hui-Jia; Daniels, Jasmine J.

2015-01-01

Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Clustering algorithm for determining community structure in large networks

NASA Astrophysics Data System (ADS)

Pujol, Josep M.; Béjar, Javier; Delgado, Jordi

2006-07-01

We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
A community detection algorithm based on structural similarity

NASA Astrophysics Data System (ADS)

Guo, Xuchao; Hao, Xia; Liu, Yaqiong; Zhang, Li; Wang, Lu

2017-09-01

In order to further improve the efficiency and accuracy of community detection algorithm, a new algorithm named SSTCA (the community detection algorithm based on structural similarity with threshold) is proposed. In this algorithm, the structural similarities are taken as the weights of edges, and the threshold k is considered to remove multiple edges whose weights are less than the threshold, and improve the computational efficiency. Tests were done on the Zachary’s network, Dolphins’ social network and Football dataset by the proposed algorithm, and compared with GN and SSNCA algorithm. The results show that the new algorithm is superior to other algorithms in accuracy for the dense networks and the operating efficiency is improved obviously.
Finding The Most Important Actor in Online Crowd by Social Network Analysis

NASA Astrophysics Data System (ADS)

Yuliana, I.; Santosa, P. I.; Setiawan, N. A.; Sukirman

2017-02-01

Billion of people create trillions of connections through social media every single day. The increasing use of social media has led to dramatic changes in the of way science, government, healthcare, entertainment and enterprise operate. Large-scale participation in Technology-Mediated Social Participation (TMSP) system has opened up incredible new opportunities to deploy online crowd. This descriptive-correlational research used social network analysis (SNA) on data gathered from Fanpage Facebook of Greenpeace Indonesia related to important critical issues, the bushfires in 2015. SNA identifies relations on each member by sociometrics parameter such as three centrality (degree, closeness and betweenesse) for measuring and finding the most important actor in the online community. This paper use Fruchterman Rein-gold algorithm to visualize the online community in a graph, while Clauset-Newman-Moore is a technique to identify groups in community. As the result found 3735 vertices related to actors, 6927 edges as relation, 14 main actors in size order and 22 groups in Greenpeace Indonesia online community. This research contributes to organize some information for Greenpeace Indonesia managing their potency in online community to identify human behaviour.
Spatial correlation analysis of urban traffic state under a perspective of community detection

NASA Astrophysics Data System (ADS)

Yang, Yanfang; Cao, Jiandong; Qin, Yong; Jia, Limin; Dong, Honghui; Zhang, Aomuhan

2018-05-01

Understanding the spatial correlation of urban traffic state is essential for identifying the evolution patterns of urban traffic state. However, the distribution of traffic state always has characteristics of large spatial span and heterogeneity. This paper adapts the concept of community detection to the correlation network of urban traffic state and proposes a new perspective to identify the spatial correlation patterns of traffic state. In the proposed urban traffic network, the nodes represent road segments, and an edge between a pair of nodes is added depending on the result of significance test for the corresponding correlation of traffic state. Further, the process of community detection in the urban traffic network (named GWPA-K-means) is applied to analyze the spatial dependency of traffic state. The proposed method extends the traditional K-means algorithm in two steps: (i) redefines the initial cluster centers by two properties of nodes (the GWPA value and the minimum shortest path length); (ii) utilizes the weight signal propagation process to transfer the topological information of the urban traffic network into a node similarity matrix. Finally, numerical experiments are conducted on a simple network and a real urban road network in Beijing. The results show that GWPA-K-means algorithm is valid in spatial correlation analysis of traffic state. The network science and community structure analysis perform well in describing the spatial heterogeneity of traffic state on a large spatial scale.
Direct targeting of risk factors significantly increases the detection of liver cirrhosis in primary care: a cross-sectional diagnostic study utilising transient elastography.

PubMed

Harman, David J; Ryder, Stephen D; James, Martin W; Jelpke, Matthew; Ottey, Dominic S; Wilkes, Emilie A; Card, Timothy R; Aithal, Guruprasad P; Guha, Indra Neil

2015-05-03

To assess the feasibility of a novel diagnostic algorithm targeting patients with risk factors for chronic liver disease in a community setting. Prospective cross-sectional study. Two primary care practices (adult patient population 10,479) in Nottingham, UK. Adult patients (aged 18 years or over) fulfilling one or more selected risk factors for developing chronic liver disease: (1) hazardous alcohol use, (2) type 2 diabetes or (3) persistently elevated alanine aminotransferase (ALT) liver function enzyme with negative serology. A serial biomarker algorithm, using a simple blood-based marker (aspartate aminotransferase:ALT ratio for hazardous alcohol users, BARD score for other risk groups) and subsequently liver stiffness measurement using transient elastography (TE). Diagnosis of clinically significant liver disease (defined as liver stiffness ≥8 kPa); definitive diagnosis of liver cirrhosis. We identified 920 patients with the defined risk factors of whom 504 patients agreed to undergo investigation. A normal blood biomarker was found in 62 patients (12.3%) who required no further investigation. Subsequently, 378 patients agreed to undergo TE, of whom 98 (26.8% of valid scans) had elevated liver stiffness. Importantly, 71/98 (72.4%) patients with elevated liver stiffness had normal liver enzymes and would be missed by traditional investigation algorithms. We identified 11 new patients with definite cirrhosis, representing a 140% increase in the number of diagnosed cases in this population. A non-invasive liver investigation algorithm based in a community setting is feasible to implement. Targeting risk factors using a non-invasive biomarker approach identified a substantial number of patients with previously undetected cirrhosis. The diagnostic algorithm utilised for this study can be found on clinicaltrials.gov (NCT02037867), and is part of a continuing longitudinal cohort study. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Discovering Link Communities in Complex Networks by an Integer Programming Model and a Genetic Algorithm

PubMed Central

Li, Zhenping; Zhang, Xiang-Sun; Wang, Rui-Sheng; Liu, Hongwei; Zhang, Shihua

2013-01-01

Identification of communities in complex networks is an important topic and issue in many fields such as sociology, biology, and computer science. Communities are often defined as groups of related nodes or links that correspond to functional subunits in the corresponding complex systems. While most conventional approaches have focused on discovering communities of nodes, some recent studies start partitioning links to find overlapping communities straightforwardly. In this paper, we propose a new quantity function for link community identification in complex networks. Based on this quantity function we formulate the link community partition problem into an integer programming model which allows us to partition a complex network into overlapping communities. We further propose a genetic algorithm for link community detection which can partition a network into overlapping communities without knowing the number of communities. We test our model and algorithm on both artificial networks and real-world networks. The results demonstrate that the model and algorithm are efficient in detecting overlapping community structure in complex networks. PMID:24386268
A generalised significance test for individual communities in networks.

PubMed

Kojaku, Sadamori; Masuda, Naoki

2018-05-09

Many empirical networks have community structure, in which nodes are densely interconnected within each community (i.e., a group of nodes) and sparsely across different communities. Like other local and meso-scale structure of networks, communities are generally heterogeneous in various aspects such as the size, density of edges, connectivity to other communities and significance. In the present study, we propose a method to statistically test the significance of individual communities in a given network. Compared to the previous methods, the present algorithm is unique in that it accepts different community-detection algorithms and the corresponding quality function for single communities. The present method requires that a quality of each community can be quantified and that community detection is performed as optimisation of such a quality function summed over the communities. Various community detection algorithms including modularity maximisation and graph partitioning meet this criterion. Our method estimates a distribution of the quality function for randomised networks to calculate a likelihood of each community in the given network. We illustrate our algorithm by synthetic and empirical networks.
The pearls of using real-world evidence to discover social groups

NASA Astrophysics Data System (ADS)

Cardillo, Raymond A.; Salerno, John J.

2005-03-01

In previous work, we introduced a new paradigm called Uni-Party Data Community Generation (UDCG) and a new methodology to discover social groups (a.k.a., community models) called Link Discovery based on Correlation Analysis (LDCA). We further advanced this work by experimenting with a corpus of evidence obtained from a Ponzi scheme investigation. That work identified several UDCG algorithms, developed what we called "Importance Measures" to compare the accuracy of the algorithms based on ground truth, and presented a Concept of Operations (CONOPS) that criminal investigators could use to discover social groups. However, that work used a rather small random sample of manually edited documents because the evidence contained far too many OCR and other extraction errors. Deferring the evidence extraction errors allowed us to continue experimenting with UDCG algorithms, but only used a small fraction of the available evidence. In attempt to discover techniques that are more practical in the near-term, our most recent work focuses on being able to use an entire corpus of real-world evidence to discover social groups. This paper discusses the complications of extracting evidence, suggests a method of performing name resolution, presents a new UDCG algorithm, and discusses our future direction in this area.
Community detection in networks: A user guide

NASA Astrophysics Data System (ADS)

Fortunato, Santo; Hric, Darko

2016-11-01

Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like the definition of community itself, nor on other crucial issues, like the validation of algorithms and the comparison of their performances. This has generated a number of confusions and misconceptions, which undermine the progress in the field. We offer a guided tour through the main aspects of the problem. We also point out strengths and weaknesses of popular methods, and give directions to their use.
A Collaborative Recommend Algorithm Based on Bipartite Community

PubMed Central

Fu, Yuchen; Liu, Quan; Cui, Zhiming

2014-01-01

The recommendation algorithm based on bipartite network is superior to traditional methods on accuracy and diversity, which proves that considering the network topology of recommendation systems could help us to improve recommendation results. However, existing algorithms mainly focus on the overall topology structure and those local characteristics could also play an important role in collaborative recommend processing. Therefore, on account of data characteristics and application requirements of collaborative recommend systems, we proposed a link community partitioning algorithm based on the label propagation and a collaborative recommendation algorithm based on the bipartite community. Then we designed numerical experiments to verify the algorithm validity under benchmark and real database. PMID:24955393
A convolutional neural network neutrino event classifier

DOE PAGES

Aurisano, A.; Radovic, A.; Rocco, D.; ...

2016-09-01

Here, convolutional neural networks (CNNs) have been widely applied in the computer vision community to solve complex problems in image recognition and analysis. We describe an application of the CNN technology to the problem of identifying particle interactions in sampling calorimeters used commonly in high energy physics and high energy neutrino physics in particular. Following a discussion of the core concepts of CNNs and recent innovations in CNN architectures related to the field of deep learning, we outline a specific application to the NOvA neutrino detector. This algorithm, CVN (Convolutional Visual Network) identifies neutrino interactions based on their topology withoutmore » the need for detailed reconstruction and outperforms algorithms currently in use by the NOvA collaboration.« less
A convolutional neural network neutrino event classifier

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aurisano, A.; Radovic, A.; Rocco, D.

Here, convolutional neural networks (CNNs) have been widely applied in the computer vision community to solve complex problems in image recognition and analysis. We describe an application of the CNN technology to the problem of identifying particle interactions in sampling calorimeters used commonly in high energy physics and high energy neutrino physics in particular. Following a discussion of the core concepts of CNNs and recent innovations in CNN architectures related to the field of deep learning, we outline a specific application to the NOvA neutrino detector. This algorithm, CVN (Convolutional Visual Network) identifies neutrino interactions based on their topology withoutmore » the need for detailed reconstruction and outperforms algorithms currently in use by the NOvA collaboration.« less
Community-aware task allocation for social networked multiagent systems.

PubMed

Wang, Wanyuan; Jiang, Yichuan

2014-09-01

In this paper, we propose a novel community-aware task allocation model for social networked multiagent systems (SN-MASs), where the agent' cooperation domain is constrained in community and each agent can negotiate only with its intracommunity member agents. Under such community-aware scenarios, we prove that it remains NP-hard to maximize system overall profit. To solve this problem effectively, we present a heuristic algorithm that is composed of three phases: 1) task selection: select the desirable task to be allocated preferentially; 2) allocation to community: allocate the selected task to communities based on a significant task-first heuristics; and 3) allocation to agent: negotiate resources for the selected task based on a nonoverlap agent-first and breadth-first resource negotiation mechanism. Through the theoretical analyses and experiments, the advantages of our presented heuristic algorithm and community-aware task allocation model are validated. 1) Our presented heuristic algorithm performs very closely to the benchmark exponential brute-force optimal algorithm and the network flow-based greedy algorithm in terms of system overall profit in small-scale applications. Moreover, in the large-scale applications, the presented heuristic algorithm achieves approximately the same overall system profit, but significantly reduces the computational load compared with the greedy algorithm. 2) Our presented community-aware task allocation model reduces the system communication cost compared with the previous global-aware task allocation model and improves the system overall profit greatly compared with the previous local neighbor-aware task allocation model.
A cooperative game framework for detecting overlapping communities in social networks

NASA Astrophysics Data System (ADS)

Jonnalagadda, Annapurna; Kuppusamy, Lakshmanan

2018-02-01

Community detection in social networks is a challenging and complex task, which received much attention from researchers of multiple domains in recent years. The evolution of communities in social networks happens merely due to the self-interest of the nodes. The interesting feature of community structure in social networks is the multi membership of the nodes resulting in overlapping communities. Assuming the nodes of the social network as self-interested players, the dynamics of community formation can be captured in the form of a game. In this paper, we propose a greedy algorithm, namely, Weighted Graph Community Game (WGCG), in order to model the interactions among the self-interested nodes of the social network. The proposed algorithm employs the Shapley value mechanism to discover the inherent communities of the underlying social network. The experimental evaluation on the real-world and synthetic benchmark networks demonstrates that the performance of the proposed algorithm is superior to the state-of-the-art overlapping community detection algorithms.

Leveraging health social networking communities in translational research.

PubMed

Webster, Yue W; Dow, Ernst R; Koehler, Jacob; Gudivada, Ranga C; Palakal, Mathew J

2011-08-01

Health social networking communities are emerging resources for translational research. We have designed and implemented a framework called HyGen, which combines Semantic Web technologies, graph algorithms and user profiling to discover and prioritize novel associations across disciplines. This manuscript focuses on the key strategies developed to overcome the challenges in handling patient-generated content in Health social networking communities. Heuristic and quantitative evaluations were carried out in colorectal cancer. The results demonstrate the potential of our approach to bridge silos and to identify hidden links among clinical observations, drugs, genes and diseases. In Amyotrophic Lateral Sclerosis case studies, HyGen has identified 15 of the 20 published disease genes. Additionally, HyGen has highlighted new candidates for future investigations, as well as a scientifically meaningful connection between riluzole and alcohol abuse. Copyright © 2011 Elsevier Inc. All rights reserved.
Reliability and validity of bilateral ankle accelerometer algorithms for activity recognition and walking speed after stroke.

PubMed

Dobkin, Bruce H; Xu, Xiaoyu; Batalin, Maxim; Thomas, Seth; Kaiser, William

2011-08-01

Outcome measures of mobility for large stroke trials are limited to timed walks for short distances in a laboratory, step counters and ordinal scales of disability and quality of life. Continuous monitoring and outcome measurements of the type and quantity of activity in the community would provide direct data about daily performance, including compliance with exercise and skills practice during routine care and clinical trials. Twelve adults with impaired ambulation from hemiparetic stroke and 6 healthy controls wore triaxial accelerometers on their ankles. Walking speed for repeated outdoor walks was determined by machine-learning algorithms and compared to a stopwatch calculation of speed for distances not known to the algorithm. The reliability of recognizing walking, exercise, and cycling by the algorithms was compared to activity logs. A high correlation was found between stopwatch-measured outdoor walking speed and algorithm-calculated speed (Pearson coefficient, 0.98; P=0.001) and for repeated measures of algorithm-derived walking speed (P=0.01). Bouts of walking >5 steps, variations in walking speed, cycling, stair climbing, and leg exercises were correctly identified during a day in the community. Compared to healthy subjects, those with stroke were, as expected, more sedentary and slower, and their gait revealed high paretic-to-unaffected leg swing ratios. Test-retest reliability and concurrent and construct validity are high for activity pattern-recognition Bayesian algorithms developed from inertial sensors. This ratio scale data can provide real-world monitoring and outcome measurements of lower extremity activities and walking speed for stroke and rehabilitation studies.
The Operation IceBridge Sea Ice Freeboard, Snow Septh and Thickness Product: An In-Depth Look at Past, Current and Future Versions

NASA Astrophysics Data System (ADS)

Harbeck, J.; Kurtz, N. T.; Studinger, M.; Onana, V.; Yi, D.

2015-12-01

The NASA Operation IceBridge Project Science Office has recently released an updated version of the sea ice freeboard, snow depth and thickness product (IDCSI4). This product is generated through the combination of multiple IceBridge instrument data, primarily the ATM laser altimeter, DMS georeferenced imagery and the CReSIS snow radar, and is available on a campaign-specific basis as all upstream data sets become available. Version 1 data (IDCSI2) was the initial data production; we have subsequently received community feedback that has now been incorporated, allowing us to provide an improved data product. All data now available to the public at the National Snow and Ice Data Center (NSIDC) have been homogeneously reprocessed using the new IDCSI4 algorithm. This algorithm contains significant upgrades that improve the quality and consistency of the dataset, including updated atmospheric and oceanic tidal models and replacement of the geoid with a more representative mean sea surface height product. Known errors with the IDCSI2 algorithm, identified by the Project Science Office as well as feedback from the scientific community, have been incorporated into the new algorithm as well. We will describe in detail the various steps of the IDCSI4 algorithm, show the improvements made over the IDCSI2 dataset and their beneficial impact and discuss future upgrades planned for the next version.
Online Community Detection for Large Complex Networks

PubMed Central

Pan, Gang; Zhang, Wangsheng; Wu, Zhaohui; Li, Shijian

2014-01-01

Complex networks describe a wide range of systems in nature and society. To understand complex networks, it is crucial to investigate their community structure. In this paper, we develop an online community detection algorithm with linear time complexity for large complex networks. Our algorithm processes a network edge by edge in the order that the network is fed to the algorithm. If a new edge is added, it just updates the existing community structure in constant time, and does not need to re-compute the whole network. Therefore, it can efficiently process large networks in real time. Our algorithm optimizes expected modularity instead of modularity at each step to avoid poor performance. The experiments are carried out using 11 public data sets, and are measured by two criteria, modularity and NMI (Normalized Mutual Information). The results show that our algorithm's running time is less than the commonly used Louvain algorithm while it gives competitive performance. PMID:25061683
Overlapping communities detection based on spectral analysis of line graphs

NASA Astrophysics Data System (ADS)

Gui, Chun; Zhang, Ruisheng; Hu, Rongjing; Huang, Guoming; Wei, Jiaxuan

2018-05-01

Community in networks are often overlapping where one vertex belongs to several clusters. Meanwhile, many networks show hierarchical structure such that community is recursively grouped into hierarchical organization. In order to obtain overlapping communities from a global hierarchy of vertices, a new algorithm (named SAoLG) is proposed to build the hierarchical organization along with detecting the overlap of community structure. SAoLG applies the spectral analysis into line graphs to unify the overlap and hierarchical structure of the communities. In order to avoid the limitation of absolute distance such as Euclidean distance, SAoLG employs Angular distance to compute the similarity between vertices. Furthermore, we make a micro-improvement partition density to evaluate the quality of community structure and use it to obtain the more reasonable and sensible community numbers. The proposed SAoLG algorithm achieves a balance between overlap and hierarchy by applying spectral analysis to edge community detection. The experimental results on one standard network and six real-world networks show that the SAoLG algorithm achieves higher modularity and reasonable community number values than those generated by Ahn's algorithm, the classical CPM and GN ones.
A clustering algorithm for determining community structure in complex networks

NASA Astrophysics Data System (ADS)

Jin, Hong; Yu, Wei; Li, ShiJun

2018-02-01

Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
The Method for Assigning Priority Levels (MAPLe): A new decision-support system for allocating home care resources

PubMed Central

Hirdes, John P; Poss, Jeff W; Curtin-Telegdi, Nancy

2008-01-01

Background Home care plays a vital role in many health care systems, but there is evidence that appropriate targeting strategies must be used to allocate limited home care resources effectively. The aim of the present study was to develop and validate a methodology for prioritizing access to community and facility-based services for home care clients. Methods Canadian and international data based on the Resident Assessment Instrument – Home Care (RAI-HC) were analyzed to identify predictors for nursing home placement, caregiver distress and for being rated as requiring alternative placement to improve outlook. Results The Method for Assigning Priority Levels (MAPLe) algorithm was a strong predictor of all three outcomes in the derivation sample. The algorithm was validated with additional data from five other countries, three other provinces, and an Ontario sample obtained after the use of the RAI-HC was mandated. Conclusion The MAPLe algorithm provides a psychometrically sound decision-support tool that may be used to inform choices related to allocation of home care resources and prioritization of clients needing community or facility-based services. PMID:18366782
LP-LPA: A link influence-based label propagation algorithm for discovering community structures in networks

NASA Astrophysics Data System (ADS)

Berahmand, Kamal; Bouyer, Asgarali

2018-03-01

Community detection is an essential approach for analyzing the structural and functional properties of complex networks. Although many community detection algorithms have been recently presented, most of them are weak and limited in different ways. Label Propagation Algorithm (LPA) is a well-known and efficient community detection technique which is characterized by the merits of nearly-linear running time and easy implementation. However, LPA has some significant problems such as instability, randomness, and monster community detection. In this paper, an algorithm, namely node’s label influence policy for label propagation algorithm (LP-LPA) was proposed for detecting efficient community structures. LP-LPA measures link strength value for edges and nodes’ label influence value for nodes in a new label propagation strategy with preference on link strength and for initial nodes selection, avoid of random behavior in tiebreak states, and efficient updating order and rule update. These procedures can sort out the randomness issue in an original LPA and stabilize the discovered communities in all runs of the same network. Experiments on synthetic networks and a wide range of real-world social networks indicated that the proposed method achieves significant accuracy and high stability. Indeed, it can obviously solve monster community problem with regard to detecting communities in networks.
Systematic identification and prioritization of communities impacted by residential woodsmoke in British Columbia, Canada.

PubMed

Hong, Kris Y; Weichenthal, Scott; Saraswat, Arvind; King, Gavin H; Henderson, Sarah B; Brauer, Michael

2017-01-01

Residential woodsmoke is an under-regulated source of fine particulate matter (PM 2.5 ), often surpassing mobile and industrial emissions in rural communities in North America and elsewhere. In the province of British Columbia (BC), Canada, many municipalities are hesitant to adopt stricter regulations for residential wood burning without empirical evidence that smoke is affecting local air quality. The objective of this study was to develop a retrospective algorithm that uses 1-h PM 2.5 concentrations and daily temperature data to identify smoky days in order to prioritise communities by smoke impacts. Levoglucosan measurements from one of the smokiest communities were used to establish the most informative values for three algorithmic parameters: the daily standard deviation of 1-h PM 2.5 measurements; the daily mean temperature; and the daytime-to-nighttime ratio of PM 2.5 concentrations. Alternate parameterizations were tested in 45 sensitivity analyses. Using the most informative parameter values on the most recent two years of data for each community, the number of smoky days ranged from 5 to 277. Heat maps visualizing seasonal and diurnal variation in PM 2.5 concentrations showed clear differences between the higher- and lower-ranked communities. Some communities were sensitive to one or more of the parameters, but the overall rankings were consistent across the 45 analyses. This information will allow stakeholder agencies to work with local governments on implementing appropriate intervention strategies for the most smoke-impacted communities. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
A Comparative Analysis of Community Detection Algorithms on Artificial Networks

PubMed Central

Yang, Zhao; Algesheimer, René; Tessone, Claudio J.

2016-01-01

Many community detection algorithms have been developed to uncover the mesoscopic properties of complex networks. However how good an algorithm is, in terms of accuracy and computing time, remains still open. Testing algorithms on real-world network has certain restrictions which made their insights potentially biased: the networks are usually small, and the underlying communities are not defined objectively. In this study, we employ the Lancichinetti-Fortunato-Radicchi benchmark graph to test eight state-of-the-art algorithms. We quantify the accuracy using complementary measures and algorithms’ computing time. Based on simple network properties and the aforementioned results, we provide guidelines that help to choose the most adequate community detection algorithm for a given network. Moreover, these rules allow uncovering limitations in the use of specific algorithms given macroscopic network properties. Our contribution is threefold: firstly, we provide actual techniques to determine which is the most suited algorithm in most circumstances based on observable properties of the network under consideration. Secondly, we use the mixing parameter as an easily measurable indicator of finding the ranges of reliability of the different algorithms. Finally, we study the dependency with network size focusing on both the algorithm’s predicting power and the effective computing time. PMID:27476470
Seasonal and Inter-Annual Patterns of Phytoplankton Community Structure in Monterey Bay, CA Derived from AVIRIS Data During the 2013-2015 HyspIRI Airborne Campaign

NASA Astrophysics Data System (ADS)

Palacios, S. L.; Thompson, D. R.; Kudela, R. M.; Negrey, K.; Guild, L. S.; Gao, B. C.; Green, R. O.; Torres-Perez, J. L.

2015-12-01

There is a need in the ocean color community to discriminate among phytoplankton groups within the bulk chlorophyll pool to understand ocean biodiversity, to track energy flow through ecosystems, and to identify and monitor for harmful algal blooms. Imaging spectrometer measurements enable use of sophisticated spectroscopic algorithms for applications such as differentiating among coral species, evaluating iron stress of phytoplankton, and discriminating phytoplankton taxa. These advanced algorithms rely on the fine scale, subtle spectral shape of the atmospherically corrected remote sensing reflectance (Rrs) spectrum of the ocean surface. As a consequence, these algorithms are sensitive to inaccuracies in the retrieved Rrs spectrum that may be related to the presence of nearby clouds, inadequate sensor calibration, low sensor signal-to-noise ratio, glint correction, and atmospheric correction. For the HyspIRI Airborne Campaign, flight planning considered optimal weather conditions to avoid flights with significant cloud/fog cover. Although best suited for terrestrial targets, the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) has enough signal for some coastal chlorophyll algorithms and meets sufficient calibration requirements for most channels. However, the coastal marine environment has special atmospheric correction needs due to error that may be introduced by aerosols and terrestrially sourced atmospheric dust and riverine sediment plumes. For this HyspIRI campaign, careful attention has been given to the correction of AVIRIS imagery of the Monterey Bay to optimize ocean Rrs retrievals for use in estimating chlorophyll (OC3 algorithm) and phytoplankton functional type (PHYDOTax algorithm) data products. This new correction method has been applied to several image collection dates during two oceanographic seasons - upwelling and the warm, stratified oceanic period for 2013 and 2014. These two periods are dominated by either diatom blooms (occasionally toxic) or red tides. Results presented include chlorophyll and phytoplankton community structure and in-water validation data for these dates during these two seasons.
Label propagation algorithm for community detection based on node importance and label influence

NASA Astrophysics Data System (ADS)

Zhang, Xian-Kun; Ren, Jing; Song, Chen; Jia, Jia; Zhang, Qian

2017-09-01

Recently, the detection of high-quality community has become a hot spot in the research of social network. Label propagation algorithm (LPA) has been widely concerned since it has the advantages of linear time complexity and is unnecessary to define objective function and the number of community in advance. However, LPA has the shortcomings of uncertainty and randomness in the label propagation process, which affects the accuracy and stability of the community. For large-scale social network, this paper proposes a novel label propagation algorithm for community detection based on node importance and label influence (LPA_NI). The experiments with comparative algorithms on real-world networks and synthetic networks have shown that LPA_NI can significantly improve the quality of community detection and shorten the iteration period. Also, it has better accuracy and stability in the case of similar complexity.
Finding community structure in very large networks

NASA Astrophysics Data System (ADS)

Clauset, Aaron; Newman, M. E. J.; Moore, Cristopher

2004-12-01

The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O(mdlogn) where d is the depth of the dendrogram describing the community structure. Many real-world networks are sparse and hierarchical, with mtilde n and dtilde logn , in which case our algorithm runs in essentially linear time, O(nlog2n) . As an example of the application of this algorithm we use it to analyze a network of items for sale on the web site of a large on-line retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2×106 edges. We show that our algorithm can extract meaningful communities from this network, revealing large-scale patterns present in the purchasing habits of customers.
ADAM: analysis of discrete models of biological systems using computer algebra.

PubMed

Hinkelmann, Franziska; Brandon, Madison; Guang, Bonny; McNeill, Rustin; Blekherman, Grigoriy; Veliz-Cuba, Alan; Laubenbacher, Reinhard

2011-07-20

Many biological systems are modeled qualitatively with discrete models, such as probabilistic Boolean networks, logical models, Petri nets, and agent-based models, to gain a better understanding of them. The computational complexity to analyze the complete dynamics of these models grows exponentially in the number of variables, which impedes working with complex models. There exist software tools to analyze discrete models, but they either lack the algorithmic functionality to analyze complex models deterministically or they are inaccessible to many users as they require understanding the underlying algorithm and implementation, do not have a graphical user interface, or are hard to install. Efficient analysis methods that are accessible to modelers and easy to use are needed. We propose a method for efficiently identifying attractors and introduce the web-based tool Analysis of Dynamic Algebraic Models (ADAM), which provides this and other analysis methods for discrete models. ADAM converts several discrete model types automatically into polynomial dynamical systems and analyzes their dynamics using tools from computer algebra. Specifically, we propose a method to identify attractors of a discrete model that is equivalent to solving a system of polynomial equations, a long-studied problem in computer algebra. Based on extensive experimentation with both discrete models arising in systems biology and randomly generated networks, we found that the algebraic algorithms presented in this manuscript are fast for systems with the structure maintained by most biological systems, namely sparseness and robustness. For a large set of published complex discrete models, ADAM identified the attractors in less than one second. Discrete modeling techniques are a useful tool for analyzing complex biological systems and there is a need in the biological community for accessible efficient analysis tools. ADAM provides analysis methods based on mathematical algorithms as a web-based tool for several different input formats, and it makes analysis of complex models accessible to a larger community, as it is platform independent as a web-service and does not require understanding of the underlying mathematics.
Towards Online Multiresolution Community Detection in Large-Scale Networks

PubMed Central

Huang, Jianbin; Sun, Heli; Liu, Yaguang; Song, Qinbao; Weninger, Tim

2011-01-01

The investigation of community structure in networks has aroused great interest in multiple disciplines. One of the challenges is to find local communities from a starting vertex in a network without global information about the entire network. Many existing methods tend to be accurate depending on a priori assumptions of network properties and predefined parameters. In this paper, we introduce a new quality function of local community and present a fast local expansion algorithm for uncovering communities in large-scale networks. The proposed algorithm can detect multiresolution community from a source vertex or communities covering the whole network. Experimental results show that the proposed algorithm is efficient and well-behaved in both real-world and synthetic networks. PMID:21887325
On designing of a low leakage patient-centric provider network.

PubMed

Zheng, Yuchen; Lin, Kun; White, Thomas; Pickreign, Jeremy; Yuen-Reed, Gigi

2018-03-27

When a patient in a provider network seeks services outside of their community, the community experiences a leakage. Leakage is undesirable as it typically leads to higher out-of-network cost for patient and increases barrier for care coordination, which is particularly problematic for Accountable Care Organization (ACO) as the in-network providers are financially responsible for quality of care and outcome. We aim to design a data-driven method to identify naturally occurring provider networks driven by diabetic patient choices, and understand the relationship among provider composition, patient composition, and service leakage pattern. By doing so, we learn the features of low service leakage provider networks that can be generalized to different patient population. Data used for this study include de-identified healthcare insurance administrative data acquired from Capital District Physicians' Health Plan (CDPHP) for diabetic patients who resided in four New York state counties (Albany, Rensselaer, Saratoga, and Schenectady) in 2014. We construct a healthcare provider network based on patients' historical medical insurance claims. A community detection algorithm is used to identify naturally occurring communities of collaborating providers. For each detected community, a profile is built using several new key measures to elucidate stakeholders of our findings. Finally, import-export analysis is conducted to benchmark their leakage pattern and identify further leakage reduction opportunity. The design yields six major provider communities with diverse profiles. Some communities are geographically concentrated, while others tend to draw patients with certain diabetic co-morbidities. Providers from the same healthcare institution are likely to be assigned to the same community. While most communities have high within-community utilization and spending, at 85% and 86% respectively, leakage still persists. Hence, we utilize a metric from import-export analysis to detect leakage, gaining insight on how to minimize leakage. We identify patient-driven provider organization by surfacing providers who share a large number of patients. By analyzing the import-export behavior of each identified community using a novel approach and profiling community patient and provider composition we understand the key features of having a balanced number of PCP and specialists and provider heterogeneity.
MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.

PubMed

He, Tiantian; Chan, Keith C C

2018-05-01

An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.
A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism

PubMed Central

Klabjan, Diego; Jonnalagadda, Siddhartha Reddy

2016-01-01

Background Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. Objective In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. Methods Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. Results On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system–based (health related) features used in the model enhance the algorithm’s performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. Conclusions Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites. PMID:27485666
Community detection in complex networks by using membrane algorithm

NASA Astrophysics Data System (ADS)

Liu, Chuang; Fan, Linan; Liu, Zhou; Dai, Xiang; Xu, Jiamei; Chang, Baoren

Community detection in complex networks is a key problem of network analysis. In this paper, a new membrane algorithm is proposed to solve the community detection in complex networks. The proposed algorithm is based on membrane systems, which consists of objects, reaction rules, and a membrane structure. Each object represents a candidate partition of a complex network, and the quality of objects is evaluated according to network modularity. The reaction rules include evolutionary rules and communication rules. Evolutionary rules are responsible for improving the quality of objects, which employ the differential evolutionary algorithm to evolve objects. Communication rules implement the information exchanged among membranes. Finally, the proposed algorithm is evaluated on synthetic, real-world networks with real partitions known and the large-scaled networks with real partitions unknown. The experimental results indicate the superior performance of the proposed algorithm in comparison with other experimental algorithms.
Approximation of Nash equilibria and the network community structure detection problem

PubMed Central

2017-01-01

Game theory based methods designed to solve the problem of community structure detection in complex networks have emerged in recent years as an alternative to classical and optimization based approaches. The Mixed Nash Extremal Optimization uses a generative relation for the characterization of Nash equilibria to identify the community structure of a network by converting the problem into a non-cooperative game. This paper proposes a method to enhance this algorithm by reducing the number of payoff function evaluations. Numerical experiments performed on synthetic and real-world networks show that this approach is efficient, with results better or just as good as other state-of-the-art methods. PMID:28467496

Uncovering the overlapping community structure of complex networks by maximal cliques

NASA Astrophysics Data System (ADS)

Li, Junqiu; Wang, Xingyuan; Cui, Yaozu

2014-12-01

In this paper, a unique algorithm is proposed to detect overlapping communities in the un-weighted and weighted networks with considerable accuracy. The maximal cliques, overlapping vertex, bridge vertex and isolated vertex are introduced. First, all the maximal cliques are extracted by the algorithm based on the deep and bread searching. Then two maximal cliques can be merged into a larger sub-graph by some given rules. In addition, the proposed algorithm successfully finds overlapping vertices and bridge vertices between communities. Experimental results using some real-world networks data show that the performance of the proposed algorithm is satisfactory.
Using Passive Sensing to Estimate Relative Energy Expenditure for Eldercare Monitoring

PubMed Central

2012-01-01

This paper describes ongoing work in analyzing sensor data logged in the homes of seniors. An estimation of relative energy expenditure is computed using motion density from passive infrared motion sensors mounted in the environment. We introduce a new algorithm for detecting visitors in the home using motion sensor data and a set of fuzzy rules. The visitor algorithm, as well as a previous algorithm for identifying time-away-from-home (TAFH), are used to filter the logged motion sensor data. Thus, the energy expenditure estimate uses data collected only when the resident is home alone. Case studies are included from TigerPlace, an Aging in Place community, to illustrate how the relative energy expenditure estimate can be used to track health conditions over time. PMID:25266777
Active Semi-Supervised Community Detection Based on Must-Link and Cannot-Link Constraints

PubMed Central

Cheng, Jianjun; Leng, Mingwei; Li, Longjie; Zhou, Hanhai; Chen, Xiaoyun

2014-01-01

Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods. PMID:25329660
Identification of Physician-Diagnosed Alzheimer's Disease and Related Dementias in Population-Based Administrative Data: A Validation Study Using Family Physicians' Electronic Medical Records.

PubMed

Jaakkimainen, R Liisa; Bronskill, Susan E; Tierney, Mary C; Herrmann, Nathan; Green, Diane; Young, Jacqueline; Ivers, Noah; Butt, Debra; Widdifield, Jessica; Tu, Karen

2016-08-10

Population-based surveillance of Alzheimer's and related dementias (AD-RD) incidence and prevalence is important for chronic disease management and health system capacity planning. Algorithms based on health administrative data have been successfully developed for many chronic conditions. The increasing use of electronic medical records (EMRs) by family physicians (FPs) provides a novel reference standard by which to evaluate these algorithms as FPs are the first point of contact and providers of ongoing medical care for persons with AD-RD. We used FP EMR data as the reference standard to evaluate the accuracy of population-based health administrative data in identifying older adults with AD-RD over time. This retrospective chart abstraction study used a random sample of EMRs for 3,404 adults over 65 years of age from 83 community-based FPs in Ontario, Canada. AD-RD patients identified in the EMR were used as the reference standard against which algorithms identifying cases of AD-RD in administrative databases were compared. The highest performing algorithm was "one hospitalization code OR (three physician claims codes at least 30 days apart in a two year period) OR a prescription filled for an AD-RD specific medication" with sensitivity 79.3% (confidence interval (CI) 72.9-85.8%), specificity 99.1% (CI 98.8-99.4%), positive predictive value 80.4% (CI 74.0-86.8%), and negative predictive value 99.0% (CI 98.7-99.4%). This resulted in an age- and sex-adjusted incidence of 18.1 per 1,000 persons and adjusted prevalence of 72.0 per 1,000 persons in 2010/11. Algorithms developed from health administrative data are sensitive and specific for identifying older adults with AD-RD.
miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling.

PubMed

Plaisier, Christopher L; Bare, J Christopher; Baliga, Nitin S

2011-07-01

Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3' un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3'-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3'-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net.
Modeling and Analysis of Remote, Off-grid Microgrids

NASA Astrophysics Data System (ADS)

Madathil, Sreenath Chalil

Over the past century the electric power industry has evolved to support the delivery of power over long distances with highly interconnected transmission systems. Despite this evolution, some remote communities are not connected to these systems. These communities rely on small, disconnected distribution systems, i.e., microgrids, to deliver power. Power distribution in most of these remote communities often depend on a type of microgrid called "off-grid microgrids". However, as microgrids often are not held to the same reliability standards as transmission grids, remote communities can be at risk to experience extended blackouts. Recent trends have also shown an increased use of renewable energy resources in power systems for remote communities. The increased penetration of renewable resources in power generation will require complex decision making when designing a resilient power system. This is mainly due to the stochastic nature of renewable resources that can lead to loss of load or line overload during their operations. In the first part of this thesis, we develop an optimization model and accompanying solution algorithm for capacity planning and operating microgrids that include N-1 security and other practical modeling features (e.g., AC power flow physics, component efficiencies and thermal limits). We demonstrate the effectiveness of our model and solution approach on two test systems: a modified version of the IEEE 13 node test feeder and a model of a distribution system in a remote Alaskan community. Once a tractable algorithm was identified to solve the above problem, we develop a mathematical model that includes topology design of microgrids. The topology design includes building new lines, making redundant lines, and analyzing N-1 contingencies on generators and lines. We develop a rolling horizon algorithm to efficiently analyze the model and demonstrate the strength of our algorithm in the same network. Finally, we develop a stochastic model that considers generation uncertainties along with N-1 security on generation assets. We develop a chance-constrained model to analyze the efficacy of the problem under consideration and present a case study on an adapted IEEE-13 node network. A successful implementation of this research could help remote communities around the world to enhance their quality of life by providing them with cost-effective, reliable electricity.
An improved label propagation algorithm based on node importance and random walk for community detection

NASA Astrophysics Data System (ADS)

Ma, Tianren; Xia, Zhengyou

2017-05-01

Currently, with the rapid development of information technology, the electronic media for social communication is becoming more and more popular. Discovery of communities is a very effective way to understand the properties of complex networks. However, traditional community detection algorithms consider the structural characteristics of a social organization only, with more information about nodes and edges wasted. In the meanwhile, these algorithms do not consider each node on its merits. Label propagation algorithm (LPA) is a near linear time algorithm which aims to find the community in the network. It attracts many scholars owing to its high efficiency. In recent years, there are more improved algorithms that were put forward based on LPA. In this paper, an improved LPA based on random walk and node importance (NILPA) is proposed. Firstly, a list of node importance is obtained through calculation. The nodes in the network are sorted in descending order of importance. On the basis of random walk, a matrix is constructed to measure the similarity of nodes and it avoids the random choice in the LPA. Secondly, a new metric IAS (importance and similarity) is calculated by node importance and similarity matrix, which we can use to avoid the random selection in the original LPA and improve the algorithm stability. Finally, a test in real-world and synthetic networks is given. The result shows that this algorithm has better performance than existing methods in finding community structure.
Markov Dynamics as a Zooming Lens for Multiscale Community Detection: Non Clique-Like Communities and the Field-of-View Limit

PubMed Central

Schaub, Michael T.; Delvenne, Jean-Charles; Yaliraki, Sophia N.; Barahona, Mauricio

2012-01-01

In recent years, there has been a surge of interest in community detection algorithms for complex networks. A variety of computational heuristics, some with a long history, have been proposed for the identification of communities or, alternatively, of good graph partitions. In most cases, the algorithms maximize a particular objective function, thereby finding the ‘right’ split into communities. Although a thorough comparison of algorithms is still lacking, there has been an effort to design benchmarks, i.e., random graph models with known community structure against which algorithms can be evaluated. However, popular community detection methods and benchmarks normally assume an implicit notion of community based on clique-like subgraphs, a form of community structure that is not always characteristic of real networks. Specifically, networks that emerge from geometric constraints can have natural non clique-like substructures with large effective diameters, which can be interpreted as long-range communities. In this work, we show that long-range communities escape detection by popular methods, which are blinded by a restricted ‘field-of-view’ limit, an intrinsic upper scale on the communities they can detect. The field-of-view limit means that long-range communities tend to be overpartitioned. We show how by adopting a dynamical perspective towards community detection [1], [2], in which the evolution of a Markov process on the graph is used as a zooming lens over the structure of the network at all scales, one can detect both clique- or non clique-like communities without imposing an upper scale to the detection. Consequently, the performance of algorithms on inherently low-diameter, clique-like benchmarks may not always be indicative of equally good results in real networks with local, sparser connectivity. We illustrate our ideas with constructive examples and through the analysis of real-world networks from imaging, protein structures and the power grid, where a multiscale structure of non clique-like communities is revealed. PMID:22384178
Multiway spectral community detection in networks

NASA Astrophysics Data System (ADS)

Zhang, Xiao; Newman, M. E. J.

2015-11-01

One of the most widely used methods for community detection in networks is the maximization of the quality function known as modularity. Of the many maximization techniques that have been used in this context, some of the most conceptually attractive are the spectral methods, which are based on the eigenvectors of the modularity matrix. Spectral algorithms have, however, been limited, by and large, to the division of networks into only two or three communities, with divisions into more than three being achieved by repeated two-way division. Here we present a spectral algorithm that can directly divide a network into any number of communities. The algorithm makes use of a mapping from modularity maximization to a vector partitioning problem, combined with a fast heuristic for vector partitioning. We compare the performance of this spectral algorithm with previous approaches and find it to give superior results, particularly in cases where community sizes are unbalanced. We also give demonstrative applications of the algorithm to two real-world networks and find that it produces results in good agreement with expectations for the networks studied.
Ubiquitousness of link-density and link-pattern communities in real-world networks

NASA Astrophysics Data System (ADS)

Šubelj, L.; Bajec, M.

2012-01-01

Community structure appears to be an intrinsic property of many complex real-world networks. However, recent work shows that real-world networks reveal even more sophisticated modules than classical cohesive (link-density) communities. In particular, networks can also be naturally partitioned according to similar patterns of connectedness among the nodes, revealing link-pattern communities. We here propose a propagation based algorithm that can extract both link-density and link-pattern communities, without any prior knowledge of the true structure. The algorithm was first validated on different classes of synthetic benchmark networks with community structure, and also on random networks. We have further applied the algorithm to different social, information, technological and biological networks, where it indeed reveals meaningful (composites of) link-density and link-pattern communities. The results thus seem to imply that, similarly as link-density counterparts, link-pattern communities appear ubiquitous in nature and design.
Ensemble method: Community detection based on game theory

NASA Astrophysics Data System (ADS)

Zhang, Xia; Xia, Zhengyou; Xu, Shengwu; Wang, J. D.

2014-08-01

Timely and cost-effective analytics over social network has emerged as a key ingredient for success in many businesses and government endeavors. Community detection is an active research area of relevance to analyze online social network. The problem of selecting a particular community detection algorithm is crucial if the aim is to unveil the community structure of a network. The choice of a given methodology could affect the outcome of the experiments because different algorithms have different advantages and depend on tuning specific parameters. In this paper, we propose a community division model based on the notion of game theory, which can combine advantages of previous algorithms effectively to get a better community classification result. By making experiments on some standard dataset, it verifies that our community detection model based on game theory is valid and better.
Reconfiguration of Cortical Networks in MDD Uncovered by Multiscale Community Detection with fMRI.

PubMed

He, Ye; Lim, Sol; Fortunato, Santo; Sporns, Olaf; Zhang, Lei; Qiu, Jiang; Xie, Peng; Zuo, Xi-Nian

2018-04-01

Major depressive disorder (MDD) is known to be associated with altered interactions between distributed brain regions. How these regional changes relate to the reorganization of cortical functional systems, and their modulation by antidepressant medication, is relatively unexplored. To identify changes in the community structure of cortical functional networks in MDD, we performed a multiscale community detection algorithm on resting-state functional connectivity networks of unmedicated MDD (uMDD) patients (n = 46), medicated MDD (mMDD) patients (n = 38), and healthy controls (n = 50), which yielded a spectrum of multiscale community partitions. we selected an optimal resolution level by identifying the most stable community partition for each group. uMDD and mMDD groups exhibited a similar reconfiguration of the community structure of the visual association and the default mode systems but showed different reconfiguration profiles in the frontoparietal control (FPC) subsystems. Furthermore, the central system (somatomotor/salience) and 3 frontoparietal subsystems showed strengthened connectivity with other communities in uMDD but, with the exception of 1 frontoparietal subsystem, returned to control levels in mMDD. These findings provide evidence for reconfiguration of specific cortical functional systems associated with MDD, as well as potential effects of medication in restoring disease-related network alterations, especially those of the FPC system.
Estimating the resolution limit of the map equation in community detection

NASA Astrophysics Data System (ADS)

Kawamoto, Tatsuro; Rosvall, Martin

2015-01-01

A community detection algorithm is considered to have a resolution limit if the scale of the smallest modules that can be resolved depends on the size of the analyzed subnetwork. The resolution limit is known to prevent some community detection algorithms from accurately identifying the modular structure of a network. In fact, any global objective function for measuring the quality of a two-level assignment of nodes into modules must have some sort of resolution limit or an external resolution parameter. However, it is yet unknown how the resolution limit affects the so-called map equation, which is known to be an efficient objective function for community detection. We derive an analytical estimate and conclude that the resolution limit of the map equation is set by the total number of links between modules instead of the total number of links in the full network as for modularity. This mechanism makes the resolution limit much less restrictive for the map equation than for modularity; in practice, it is orders of magnitudes smaller. Furthermore, we argue that the effect of the resolution limit often results from shoehorning multilevel modular structures into two-level descriptions. As we show, the hierarchical map equation effectively eliminates the resolution limit for networks with nested multilevel modular structures.
Continuity of care in community midwifery.

PubMed

Bowers, John; Cheyne, Helen; Mould, Gillian; Page, Miranda

2015-06-01

Continuity of care is often critical in delivering high quality health care. However, it is difficult to achieve in community health care where shift patterns and a need to minimise travelling time can reduce the scope for allocating staff to patients. Community midwifery is one example of such a challenge in the National Health Service where postnatal care typically involves a series of home visits. Ideally mothers would receive all of their antenatal and postnatal care from the same midwife. Minimising the number of staff-handovers helps ensure a better relationship between mothers and midwives, and provides more opportunity for staff to identify emerging problems over a series of home visits. This study examines the allocation and routing of midwives in the community using a variant of a multiple travelling salesmen problem algorithm incorporating staff preferences to explore trade-offs between travel time and continuity of care. This algorithm was integrated in a simulation to assess the additional effect of staff availability due to shift patterns and part-time working. The results indicate that continuity of care can be achieved with relatively small increases in travel time. However, shift patterns are problematic: perfect continuity of care is impractical but if there is a degree of flexibility in the visit schedule, reasonable continuity is feasible.
Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering

NASA Astrophysics Data System (ADS)

Kataoka, Shun; Kobayashi, Takuto; Yasuda, Muneki; Tanaka, Kazuyuki

2016-11-01

We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.
Seasonal and Inter-Annual Patterns of Chlorophyll and Phytoplankton Community Structure in Monterey Bay, CA Derived from AVIRIS Data During the 2013-2015 HyspIRI Airborne Campaign

NASA Astrophysics Data System (ADS)

Palacios, S. L.; Thompson, D. R.; Kudela, R. M.; Negrey, K.; Guild, L. S.; Gao, B. C.; Green, R. O.; Torres-Perez, J. L.

2016-02-01

There is a need in the ocean color community to discriminate among phytoplankton groups within the bulk chlorophyll pool to understand ocean biodiversity, track energy flow through ecosystems, and identify and monitor for harmful algal blooms. Imaging spectrometer measurements enable the use of sophisticated spectroscopic algorithms for applications such as differentiating among coral species and discriminating phytoplankton taxa. These advanced algorithms rely on the fine scale, subtle spectral shape of the atmospherically corrected remote sensing reflectance (Rrs) spectrum of the ocean surface. Consequently, these algorithms are sensitive to inaccuracies in the retrieved Rrs spectrum that may be related to the presence of nearby clouds, inadequate sensor calibration, low sensor signal-to-noise ratio, glint correction, and atmospheric correction. For the HyspIRI Airborne Campaign, flight planning considered optimal weather conditions to avoid flights with significant cloud/fog cover. Although best suited for terrestrial targets, the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) has enough signal for some coastal chlorophyll algorithms and meets sufficient calibration requirements for most channels. The coastal marine environment has special atmospheric correction needs due to error introduced by aerosols and terrestrially sourced atmospheric dust and riverine sediment plumes. For this HyspIRI campaign, careful attention has been given to the correction of AVIRIS imagery of the Monterey Bay to optimize ocean Rrs retrievals to estimate chlorophyll (OC3) and phytoplankton functional type (PHYDOTax) data products. This new correction method has been applied to several image collection dates during two oceanographic seasons in 2013 and 2014. These two periods are dominated by either diatom blooms or red tides. Results to be presented include chlorophyll and phytoplankton community structure and in-water validation data for these dates during the two seasons.
Analysis of Community Detection Algorithms for Large Scale Cyber Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mane, Prachita; Shanbhag, Sunanda; Kamath, Tanmayee

The aim of this project is to use existing community detection algorithms on an IP network dataset to create supernodes within the network. This study compares the performance of different algorithms on the network in terms of running time. The paper begins with an introduction to the concept of clustering and community detection followed by the research question that the team aimed to address. Further the paper describes the graph metrics that were considered in order to shortlist algorithms followed by a brief explanation of each algorithm with respect to the graph metric on which it is based. The nextmore » section in the paper describes the methodology used by the team in order to run the algorithms and determine which algorithm is most efficient with respect to running time. Finally, the last section of the paper includes the results obtained by the team and a conclusion based on those results as well as future work.« less
Exploiting social influence to magnify population-level behaviour change in maternal and child health: study protocol for a randomised controlled trial of network targeting algorithms in rural Honduras

PubMed Central

Shakya, Holly B; Stafford, Derek; Hughes, D Alex; Keegan, Thomas; Negron, Rennie; Broome, Jai; McKnight, Mark; Nicoll, Liza; Nelson, Jennifer; Iriarte, Emma; Ordonez, Maria; Airoldi, Edo; Fowler, James H; Christakis, Nicholas A

2017-01-01

Introduction Despite global progress on many measures of child health, rates of neonatal mortality remain high in the developing world. Evidence suggests that substantial improvements can be achieved with simple, low-cost interventions within family and community settings, particularly those designed to change knowledge and behaviour at the community level. Using social network analysis to identify structurally influential community members and then targeting them for intervention shows promise for the implementation of sustainable community-wide behaviour change. Methods and analysis We will use a detailed understanding of social network structure and function to identify novel ways of targeting influential individuals to foster cascades of behavioural change at a population level. Our work will involve experimental and observational analyses. We will map face-to-face social networks of 30 000 people in 176 villages in Western Honduras, and then conduct a randomised controlled trial of a friendship-based network-targeting algorithm with a set of well-established care interventions. We will also test whether the proportion of the population targeted affects the degree to which the intervention spreads throughout the network. We will test scalable methods of network targeting that would not, in the future, require the actual mapping of social networks but would still offer the prospect of rapidly identifying influential targets for public health interventions. Ethics and dissemination The Yale IRB and the Honduran Ministry of Health approved all data collection procedures (Protocol number 1506016012) and all participants will provide informed consent before enrolment. We will publish our findings in peer-reviewed journals as well as engage non-governmental organisations and other actors through venues for exchanging practical methods for behavioural health interventions, such as global health conferences. We will also develop a ‘toolkit’ for practitioners to use in network-based intervention efforts, including public release of our network mapping software. Trial registration number NCT02694679; Pre-results. PMID:28289044
Coding algorithms for identifying patients with cirrhosis and hepatitis B or C virus using administrative data.

PubMed

Niu, Bolin; Forde, Kimberly A; Goldberg, David S

2015-01-01

Despite the use of administrative data to perform epidemiological and cost-effectiveness research on patients with hepatitis B or C virus (HBV, HCV), there are no data outside of the Veterans Health Administration validating whether International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes can accurately identify cirrhotic patients with HBV or HCV. The validation of such algorithms is necessary for future epidemiological studies. We evaluated the positive predictive value (PPV) of ICD-9-CM codes for identifying chronic HBV or HCV among cirrhotic patients within the University of Pennsylvania Health System, a large network that includes a tertiary care referral center, a community-based hospital, and multiple outpatient practices across southeastern Pennsylvania and southern New Jersey. We reviewed a random sample of 200 cirrhotic patients with ICD-9-CM codes for HCV and 150 cirrhotic patients with ICD-9-CM codes for HBV. The PPV of 1 inpatient or 2 outpatient HCV codes was 88.0% (168/191, 95% CI: 82.5-92.2%), while the PPV of 1 inpatient or 2 outpatient HBV codes was 81.3% (113/139, 95% CI: 73.8-87.4%). Several variations of the primary coding algorithm were evaluated to determine if different combinations of inpatient and/or outpatient ICD-9-CM codes could increase the PPV of the coding algorithm. ICD-9-CM codes can identify chronic HBV or HCV in cirrhotic patients with a high PPV and can be used in future epidemiologic studies to examine disease burden and the proper allocation of resources. Copyright © 2014 John Wiley & Sons, Ltd.
Detection of core-periphery structure in networks based on 3-tuple motifs

NASA Astrophysics Data System (ADS)

Ma, Chuang; Xiang, Bing-Bing; Chen, Han-Shuang; Small, Michael; Zhang, Hai-Feng

2018-05-01

Detecting mesoscale structure, such as community structure, is of vital importance for analyzing complex networks. Recently, a new mesoscale structure, core-periphery (CP) structure, has been identified in many real-world systems. In this paper, we propose an effective algorithm for detecting CP structure based on a 3-tuple motif. In this algorithm, we first define a 3-tuple motif in terms of the patterns of edges as well as the property of nodes, and then a motif adjacency matrix is constructed based on the 3-tuple motif. Finally, the problem is converted to find a cluster that minimizes the smallest motif conductance. Our algorithm works well in different CP structures: including single or multiple CP structure, and local or global CP structures. Results on the synthetic and the empirical networks validate the high performance of our method.

Subtyping attention-deficit/hyperactivity disorder using temperament dimensions: toward biologically based nosologic criteria.

PubMed

Karalunas, Sarah L; Fair, Damien; Musser, Erica D; Aykes, Kamari; Iyer, Swathi P; Nigg, Joel T

2014-09-01

Psychiatric nosology is limited by behavioral and biological heterogeneity within existing disorder categories. The imprecise nature of current nosologic distinctions limits both mechanistic understanding and clinical prediction. We demonstrate an approach consistent with the National Institute of Mental Health Research Domain Criteria initiative to identify superior, neurobiologically valid subgroups with better predictive capacity than existing psychiatric categories for childhood attention-deficit/hyperactivity disorder (ADHD). To refine subtyping of childhood ADHD by using biologically based behavioral dimensions (i.e., temperament), novel classification algorithms, and multiple external validators. A total of 437 clinically well-characterized, community-recruited children, with and without ADHD, participated in an ongoing longitudinal study. Baseline data were used to classify children into subgroups based on temperament dimensions and examine external validators including physiological and magnetic resonance imaging measures. One-year longitudinal follow-up data are reported for a subgroup of the ADHD sample to address stability and clinical prediction. Parent/guardian ratings of children on a measure of temperament were used as input features in novel community detection analyses to identify subgroups within the sample. Groups were validated using 3 widely accepted external validators: peripheral physiological characteristics (cardiac measures of respiratory sinus arrhythmia and pre-ejection period), central nervous system functioning (via resting-state functional connectivity magnetic resonance imaging), and clinical outcomes (at 1-year longitudinal follow-up). The community detection algorithm suggested 3 novel types of ADHD, labeled as mild (normative emotion regulation), surgent (extreme levels of positive approach-motivation), and irritable (extreme levels of negative emotionality, anger, and poor soothability). Types were independent of existing clinical demarcations including DSM-5 presentations or symptom severity. These types showed stability over time and were distinguished by unique patterns of cardiac physiological response, resting-state functional brain connectivity, and clinical outcomes 1 year later. Results suggest that a biologically informed temperament-based typology, developed with a discovery-based community detection algorithm, provides a superior description of heterogeneity in the ADHD population than does any current clinical nosologic criteria. This demonstration sets the stage for more aggressive attempts at a tractable, biologically based nosology.
Prevalence of migraine in a diverse community--electronic methods for migraine ascertainment in a large integrated health plan.

PubMed

Pressman, Alice; Jacobson, Alice; Eguilos, Roderick; Gelfand, Amy; Huynh, Cynthia; Hamilton, Luisa; Avins, Andrew; Bakshi, Nandini; Merikangas, Kathleen

2016-04-01

The growing availability of electronic health data provides an opportunity to ascertain diagnosis-specific cases via systematic methods for sample recruitment for clinical research and health services evaluation. We developed and implemented a migraine probability algorithm (MPA) to identify migraine from electronic health records (EHR) in an integrated health plan. We identified all migraine outpatient diagnoses and all migraine-specific prescriptions for a five-year period (April 2008-March 2013) from the Kaiser Permanente, Northern California (KPNC) EHR. We developed and evaluated the MPA in two independent samples, and derived prevalence estimates of medically-ascertained migraine in KPNC by age, sex, and race. The period prevalence of medically-ascertained migraine among KPNC adults during April 2008-March 2013 was 10.3% (women: 15.5%, men: 4.5%). Estimates peaked with age in women but remained flat for men. Prevalence among Asians was half that of whites. We demonstrate the feasibility of an EHR-based algorithm to identify cases of diagnosed migraine and determine that prevalence patterns by our methods yield results comparable to aggregate estimates of treated migraine based on direct interviews in population-based samples. This inexpensive, easily applied EHR-based algorithm provides a new opportunity for monitoring changes in migraine prevalence and identifying potential participants for research studies. © International Headache Society 2015.
Cell Membrane Tracking in Living Brain Tissue Using Differential Interference Contrast Microscopy.

PubMed

Lee, John; Kolb, Ilya; Forest, Craig R; Rozell, Christopher J

2018-04-01

Differential interference contrast (DIC) microscopy is widely used for observing unstained biological samples that are otherwise optically transparent. Combining this optical technique with machine vision could enable the automation of many life science experiments; however, identifying relevant features under DIC is challenging. In particular, precise tracking of cell boundaries in a thick ( ) slice of tissue has not previously been accomplished. We present a novel deconvolution algorithm that achieves the state-of-the-art performance at identifying and tracking these membrane locations. Our proposed algorithm is formulated as a regularized least squares optimization that incorporates a filtering mechanism to handle organic tissue interference and a robust edge-sparsity regularizer that integrates dynamic edge tracking capabilities. As a secondary contribution, this paper also describes new community infrastructure in the form of a MATLAB toolbox for accurately simulating DIC microscopy images of in vitro brain slices. Building on existing DIC optics modeling, our simulation framework additionally contributes an accurate representation of interference from organic tissue, neuronal cell-shapes, and tissue motion due to the action of the pipette. This simulator allows us to better understand the image statistics (to improve algorithms), as well as quantitatively test cell segmentation and tracking algorithms in scenarios, where ground truth data is fully known.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S; Sen, Satyabrata; Berry, M. L..

Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) program supported the development of networks of commercial-off-the-shelf (COTS) radiation counters for detecting, localizing, and identifying low-level radiation sources. Under this program, a series of indoor and outdoor tests were conducted with multiple source strengths and types, different background profiles, and various types of source and detector movements. Following the tests, network algorithms were replayed in various re-constructed scenarios using sub-networks. These measurements and algorithm traces together provide a rich collection of highly valuable datasets for testing the current and next generation radiation network algorithms, including the ones (tomore » be) developed by broader R&D communities such as distributed detection, information fusion, and sensor networks. From this multiple TeraByte IRSS database, we distilled out and packaged the first batch of canonical datasets for public release. They include measurements from ten indoor and two outdoor tests which represent increasingly challenging baseline scenarios for robustly testing radiation network algorithms.« less
Community detection in complex networks using link prediction

NASA Astrophysics Data System (ADS)

Cheng, Hui-Min; Ning, Yi-Zi; Yin, Zhao; Yan, Chao; Liu, Xin; Zhang, Zhong-Yuan

2018-01-01

Community detection and link prediction are both of great significance in network analysis, which provide very valuable insights into topological structures of the network from different perspectives. In this paper, we propose a novel community detection algorithm with inclusion of link prediction, motivated by the question whether link prediction can be devoted to improving the accuracy of community partition. For link prediction, we propose two novel indices to compute the similarity between each pair of nodes, one of which aims to add missing links, and the other tries to remove spurious edges. Extensive experiments are conducted on benchmark data sets, and the results of our proposed algorithm are compared with two classes of baselines. In conclusion, our proposed algorithm is competitive, revealing that link prediction does improve the precision of community detection.
ADAM: Analysis of Discrete Models of Biological Systems Using Computer Algebra

PubMed Central

2011-01-01

Background Many biological systems are modeled qualitatively with discrete models, such as probabilistic Boolean networks, logical models, Petri nets, and agent-based models, to gain a better understanding of them. The computational complexity to analyze the complete dynamics of these models grows exponentially in the number of variables, which impedes working with complex models. There exist software tools to analyze discrete models, but they either lack the algorithmic functionality to analyze complex models deterministically or they are inaccessible to many users as they require understanding the underlying algorithm and implementation, do not have a graphical user interface, or are hard to install. Efficient analysis methods that are accessible to modelers and easy to use are needed. Results We propose a method for efficiently identifying attractors and introduce the web-based tool Analysis of Dynamic Algebraic Models (ADAM), which provides this and other analysis methods for discrete models. ADAM converts several discrete model types automatically into polynomial dynamical systems and analyzes their dynamics using tools from computer algebra. Specifically, we propose a method to identify attractors of a discrete model that is equivalent to solving a system of polynomial equations, a long-studied problem in computer algebra. Based on extensive experimentation with both discrete models arising in systems biology and randomly generated networks, we found that the algebraic algorithms presented in this manuscript are fast for systems with the structure maintained by most biological systems, namely sparseness and robustness. For a large set of published complex discrete models, ADAM identified the attractors in less than one second. Conclusions Discrete modeling techniques are a useful tool for analyzing complex biological systems and there is a need in the biological community for accessible efficient analysis tools. ADAM provides analysis methods based on mathematical algorithms as a web-based tool for several different input formats, and it makes analysis of complex models accessible to a larger community, as it is platform independent as a web-service and does not require understanding of the underlying mathematics. PMID:21774817
A novel dynamical community detection algorithm based on weighting scheme

NASA Astrophysics Data System (ADS)

Li, Ju; Yu, Kai; Hu, Ke

2015-12-01

Network dynamics plays an important role in analyzing the correlation between the function properties and the topological structure. In this paper, we propose a novel dynamical iteration (DI) algorithm, which incorporates the iterative process of membership vector with weighting scheme, i.e. weighting W and tightness T. These new elements can be used to adjust the link strength and the node compactness for improving the speed and accuracy of community structure detection. To estimate the optimal stop time of iteration, we utilize a new stability measure which is defined as the Markov random walk auto-covariance. We do not need to specify the number of communities in advance. It naturally supports the overlapping communities by associating each node with a membership vector describing the node's involvement in each community. Theoretical analysis and experiments show that the algorithm can uncover communities effectively and efficiently.
A synthetic genetic edge detection program.

PubMed

Tabor, Jeffrey J; Salis, Howard M; Simpson, Zachary Booth; Chevalier, Aaron A; Levskaya, Anselm; Marcotte, Edward M; Voigt, Christopher A; Ellington, Andrew D

2009-06-26

Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E. coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks.
A Synthetic Genetic Edge Detection Program

PubMed Central

Tabor, Jeffrey J.; Salis, Howard; Simpson, Zachary B.; Chevalier, Aaron A.; Levskaya, Anselm; Marcotte, Edward M.; Voigt, Christopher A.; Ellington, Andrew D.

2009-01-01

Summary Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E.coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks. PMID:19563759
Validation of Community Models: Identifying Events in Space Weather Model Timelines

NASA Technical Reports Server (NTRS)

MacNeice, Peter

2009-01-01

I develop and document a set of procedures which test the quality of predictions of solar wind speed and polarity of the interplanetary magnetic field (IMF) made by coupled models of the ambient solar corona and heliosphere. The Wang-Sheeley-Arge (WSA) model is used to illustrate the application of these validation procedures. I present an algorithm which detects transitions of the solar wind from slow to high speed. I also present an algorithm which processes the measured polarity of the outward directed component of the IMF. This removes high-frequency variations to expose the longer-scale changes that reflect IMF sector changes. I apply these algorithms to WSA model predictions made using a small set of photospheric synoptic magnetograms obtained by the Global Oscillation Network Group as input to the model. The results of this preliminary validation of the WSA model (version 1.6) are summarized.
Application-Specific Graph Sampling for Frequent Subgraph Mining and Community Detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Purohit, Sumit; Choudhury, Sutanay; Holder, Lawrence B.

Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). Wemore » explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function.« less
Monitoring walking and cycling of middle-aged to older community dwellers using wireless wearable accelerometers.

PubMed

Zhang, Yuting; Beenakker, Karel G M; Butala, Pankil M; Lin, Cheng-Chieh; Little, Thomas D C; Maier, Andrea B; Stijntjes, Marjon; Vartanian, Richard; Wagenaar, Robert C

2012-01-01

Changes in gait parameters have been shown to be an important indicator of several age-related cognitive and physical declines of older adults. In this paper we propose a method to monitor and analyze walking and cycling activities based on a triaxial accelerometer worn on one ankle. We use an algorithm that can (1) distinguish between static and dynamic functional activities, (2) detect walking and cycling events, (3) identify gait parameters, including step frequency, number of steps, number of walking periods, and total walking duration per day, and (4) evaluate cycling parameters, including cycling frequency, number of cycling periods, and total cycling duration. Our algorithm is evaluated against the triaxial accelerometer data obtained from a group of 297 middle-aged to older adults wearing an activity monitor on the right ankle for approximately one week while performing unconstrained daily activities in the home and community setting. The correlation coefficients between each of detected gait and cycling parameters on two weekdays are all statistically significant, ranging from 0.668 to 0.873. These results demonstrate good test-retest reliability of our method in monitoring walking and cycling activities and analyzing gait and cycling parameters. This algorithm is efficient and causal in time and thus implementable for real-time monitoring and feedback.
Coding algorithms for identifying patients with cirrhosis and hepatitis B or C virus using administrative data

PubMed Central

Niu, Bolin; Forde, Kimberly A; Goldberg, David S.

2014-01-01

Background & Aims Despite the use of administrative data to perform epidemiological and cost-effectiveness research on patients with hepatitis B or C virus (HBV, HCV), there are no data outside of the Veterans Health Administration validating whether International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes can accurately identify cirrhotic patients with HBV or HCV. The validation of such algorithms is necessary for future epidemiological studies. Methods We evaluated the positive predictive value (PPV) of ICD-9-CM codes for identifying chronic HBV or HCV among cirrhotic patients within the University of Pennsylvania Health System, a large network that includes a tertiary care referral center, a community-based hospital, and multiple outpatient practices across southeastern Pennsylvania and southern New Jersey. We reviewed a random sample of 200 cirrhotic patients with ICD-9-CM codes for HCV and 150 cirrhotic patients with ICD-9-CM codes for HBV. Results The PPV of 1 inpatient or 2 outpatient HCV codes was 88.0% (168/191, 95% CI: 82.5–92.2%), while the PPV of 1 inpatient or 2 outpatient HBV codes was 81.3% (113/139, 95% CI: 73.8–87.4%). Several variations of the primary coding algorithm were evaluated to determine if different combinations of inpatient and/or outpatient ICD-9-CM codes could increase the PPV of the coding algorithm. Conclusions ICD-9-CM codes can identify chronic HBV or HCV in cirrhotic patients with a high PPV, and can be used in future epidemiologic studies to examine disease burden and the proper allocation of resources. PMID:25335773
Detecting livestock production zones.

PubMed

Grisi-Filho, J H H; Amaku, M; Ferreira, F; Dias, R A; Neto, J S Ferreira; Negreiros, R L; Ossada, R

2013-07-01

Communities are sets of nodes that are related in an important way, most likely sharing common properties and/or playing similar roles within a network. Unraveling a network structure, and hence the trade preferences and pathways, could be useful to a researcher or a decision maker. We implemented a community detection algorithm to find livestock communities, which is consistent with the definition of a livestock production zone, assuming that a community is a group of farm premises in which an animal is more likely to stay during its lifetime than expected by chance. We applied this algorithm to the network of animal movements within the state of Mato Grosso for 2007. This database holds information concerning 87,899 premises and 521,431 movements throughout the year, totaling 15,844,779 animals moved. The community detection algorithm achieved a network partition that shows a clear geographical and commercial pattern, two crucial features for preventive veterinary medicine applications; this algorithm provides also a meaningful interpretation to trade networks where links emerge based on trader node choices. Copyright © 2013 Elsevier B.V. All rights reserved.
Feasibility and demonstration of a cloud-based RIID analysis system

NASA Astrophysics Data System (ADS)

Wright, Michael C.; Hertz, Kristin L.; Johnson, William C.; Sword, Eric D.; Younkin, James R.; Sadler, Lorraine E.

2015-06-01

A significant limitation in the operational utility of handheld and backpack radioisotope identifiers (RIIDs) is the inability of their onboard algorithms to accurately and reliably identify the isotopic sources of the measured gamma-ray energy spectrum. A possible solution is to move the spectral analysis computations to an external device, the cloud, where significantly greater capabilities are available. The implementation and demonstration of a prototype cloud-based RIID analysis system have shown this type of system to be feasible with currently available communication and computational technology. A system study has shown that the potential user community could derive significant benefits from an appropriately implemented cloud-based analysis system and has identified the design and operational characteristics required by the users and stakeholders for such a system. A general description of the hardware and software necessary to implement reliable cloud-based analysis, the value of the cloud expressed by the user community, and the aspects of the cloud implemented in the demonstrations are discussed.
A Deep Stochastic Model for Detecting Community in Complex Networks

NASA Astrophysics Data System (ADS)

Fu, Jingcheng; Wu, Jianliang

2017-01-01

Discovering community structures is an important step to understanding the structure and dynamics of real-world networks in social science, biology and technology. In this paper, we develop a deep stochastic model based on non-negative matrix factorization to identify communities, in which there are two sets of parameters. One is the community membership matrix, of which the elements in a row correspond to the probabilities of the given node belongs to each of the given number of communities in our model, another is the community-community connection matrix, of which the element in the i-th row and j-th column represents the probability of there being an edge between a randomly chosen node from the i-th community and a randomly chosen node from the j-th community. The parameters can be evaluated by an efficient updating rule, and its convergence can be guaranteed. The community-community connection matrix in our model is more precise than the community-community connection matrix in traditional non-negative matrix factorization methods. Furthermore, the method called symmetric nonnegative matrix factorization, is a special case of our model. Finally, based on the experiments on both synthetic and real-world networks data, it can be demonstrated that our algorithm is highly effective in detecting communities.
Community Detection for Correlation Matrices

NASA Astrophysics Data System (ADS)

MacMahon, Mel; Garlaschelli, Diego

2015-04-01

A challenging problem in the study of complex systems is that of resolving, without prior information, the emergent, mesoscopic organization determined by groups of units whose dynamical activity is more strongly correlated internally than with the rest of the system. The existing techniques to filter correlations are not explicitly oriented towards identifying such modules and can suffer from an unavoidable information loss. A promising alternative is that of employing community detection techniques developed in network theory. Unfortunately, this approach has focused predominantly on replacing network data with correlation matrices, a procedure that we show to be intrinsically biased because of its inconsistency with the null hypotheses underlying the existing algorithms. Here, we introduce, via a consistent redefinition of null models based on random matrix theory, the appropriate correlation-based counterparts of the most popular community detection techniques. Our methods can filter out both unit-specific noise and system-wide dependencies, and the resulting communities are internally correlated and mutually anticorrelated. We also implement multiresolution and multifrequency approaches revealing hierarchically nested subcommunities with "hard" cores and "soft" peripheries. We apply our techniques to several financial time series and identify mesoscopic groups of stocks which are irreducible to a standard, sectorial taxonomy; detect "soft stocks" that alternate between communities; and discuss implications for portfolio optimization and risk management.
A Systematic Evaluation of Field-Based Screening Methods for the Assessment of Anterior Cruciate Ligament (ACL) Injury Risk.

PubMed

Fox, Aaron S; Bonacci, Jason; McLean, Scott G; Spittle, Michael; Saunders, Natalie

2016-05-01

Laboratory-based measures provide an accurate method to identify risk factors for anterior cruciate ligament (ACL) injury; however, these methods are generally prohibitive to the wider community. Screening methods that can be completed in a field or clinical setting may be more applicable for wider community use. Examination of field-based screening methods for ACL injury risk can aid in identifying the most applicable method(s) for use in these settings. The objective of this systematic review was to evaluate and compare field-based screening methods for ACL injury risk to determine their efficacy of use in wider community settings. An electronic database search was conducted on the SPORTDiscus™, MEDLINE, AMED and CINAHL databases (January 1990-July 2015) using a combination of relevant keywords. A secondary search of the same databases, using relevant keywords from identified screening methods, was also undertaken. Studies identified as potentially relevant were independently examined by two reviewers for inclusion. Where consensus could not be reached, a third reviewer was consulted. Original research articles that examined screening methods for ACL injury risk that could be undertaken outside of a laboratory setting were included for review. Two reviewers independently assessed the quality of included studies. Included studies were categorized according to the screening method they examined. A description of each screening method, and data pertaining to the ability to prospectively identify ACL injuries, validity and reliability, recommendations for identifying 'at-risk' athletes, equipment and training required to complete screening, time taken to screen athletes, and applicability of the screening method across sports and athletes were extracted from relevant studies. Of 1077 citations from the initial search, a total of 25 articles were identified as potentially relevant, with 12 meeting all inclusion/exclusion criteria. From the secondary search, eight further studies met all criteria, resulting in 20 studies being included for review. Five ACL-screening methods-the Landing Error Scoring System (LESS), Clinic-Based Algorithm, Observational Screening of Dynamic Knee Valgus (OSDKV), 2D-Cam Method, and Tuck Jump Assessment-were identified. There was limited evidence supporting the use of field-based screening methods in predicting ACL injuries across a range of populations. Differences relating to the equipment and time required to complete screening methods were identified. Only screening methods for ACL injury risk were included for review. Field-based screening methods developed for lower-limb injury risk in general may also incorporate, and be useful in, screening for ACL injury risk. Limited studies were available relating to the OSDKV and 2D-Cam Method. The LESS showed predictive validity in identifying ACL injuries, however only in a youth athlete population. The LESS also appears practical for community-wide use due to the minimal equipment and set-up/analysis time required. The Clinic-Based Algorithm may have predictive value for ACL injury risk as it identifies athletes who exhibit high frontal plane knee loads during a landing task, but requires extensive additional equipment and time, which may limit its application to wider community settings.
Identifying influential user communities on the social network

NASA Astrophysics Data System (ADS)

Hu, Weishu; Gong, Zhiguo; Hou U, Leong; Guo, Jingzhi

2015-10-01

Nowadays social network services have been popularly used in electronic commerce systems. Users on the social network can develop different relationships based on their common interests and activities. In order to promote the business, it is interesting to explore hidden relationships among users developed on the social network. Such knowledge can be used to locate target users for different advertisements and to provide effective product recommendations. In this paper, we define and study a novel community detection problem that is to discover the hidden community structure in large social networks based on their common interests. We observe that the users typically pay more attention to those users who share similar interests, which enable a way to partition the users into different communities according to their common interests. We propose two algorithms to detect influential communities using common interests in large social networks efficiently and effectively. We conduct our experimental evaluation using a data set from Epinions, which demonstrates that our method achieves 4-11.8% accuracy improvement over the state-of-the-art method.
A game theoretic algorithm to detect overlapping community structure in networks

NASA Astrophysics Data System (ADS)

Zhou, Xu; Zhao, Xiaohui; Liu, Yanheng; Sun, Geng

2018-04-01

Community detection can be used as an important technique for product and personalized service recommendation. A game theory based approach to detect overlapping community structure is introduced in this paper. The process of the community formation is converted into a game, when all agents (nodes) cannot improve their own utility, the game process will be terminated. The utility function is composed of a gain and a loss function and we present a new gain function in this paper. In addition, different from choosing action randomly among join, quit and switch for each agent to get new label, two new strategies for each agent to update its label are designed during the game, and the strategies are also evaluated and compared for each agent in order to find its best result. The overlapping community structure is naturally presented when the stop criterion is satisfied. The experimental results demonstrate that the proposed algorithm outperforms other similar algorithms for detecting overlapping communities in networks.

Diversity in Detection Algorithms for Atmospheric Rivers: A Community Effort to Understand the Consequences

NASA Astrophysics Data System (ADS)

Shields, C. A.; Ullrich, P. A.; Rutz, J. J.; Wehner, M. F.; Ralph, M.; Ruby, L.

2017-12-01

Atmospheric rivers (ARs) are long, narrow filamentary structures that transport large amounts of moisture in the lower layers of the atmosphere, typically from subtropical regions to mid-latitudes. ARs play an important role in regional hydroclimate by supplying significant amounts of precipitation that can alleviate drought, or in extreme cases, produce dangerous floods. Accurately detecting, or tracking, ARs is important not only for weather forecasting, but is also necessary to understand how these events may change under global warming. Detection algorithms are used on both regional and global scales, and most accurately, using high resolution datasets, or model output. Different detection algorithms can produce different answers. Detection algorithms found in the current literature fall broadly into two categories: "time-stitching", where the AR is tracked with a Lagrangian approach through time and space; and "counting", where ARs are identified for a single point in time for a single location. Counting routines can be further subdivided into algorithms that use absolute thresholds with specific geometry, to algorithms that use relative thresholds, to algorithms based on statistics, to pattern recognition and machine learning techniques. With such a large diversity in detection code, differences in AR tracking and "counts" can vary widely from technique to technique. Uncertainty increases for future climate scenarios, where the difference between relative and absolute thresholding produce vastly different counts, simply due to the moister background state in a warmer world. In an effort to quantify the uncertainty associated with tracking algorithms, the AR detection community has come together to participate in ARTMIP, the Atmospheric River Tracking Method Intercomparison Project. Each participant will provide AR metrics to the greater group by applying their code to a common reanalysis dataset. MERRA2 data was chosen for both temporal and spatial resolution. After completion of this first phase, Tier 1, ARTMIP participants may choose to contribute to Tier 2, which will range from reanalysis uncertainty, to analysis of future climate scenarios from high resolution model output. ARTMIP's experimental design, techniques, and preliminary metrics will be presented.
Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics.

PubMed

Ganapathiraju, Madhavi K; Orii, Naoki

2013-08-30

Advances in biotechnology have created "big-data" situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist's perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact. We propose that algorithms be developed to identify which of the computationally generated hypotheses have potential for high translational impact; this way, funding agencies and scientific community can invest resources and drive the research based on a global view of biomedical impact without being deterred by local view of feasibility. In short, data-analytic algorithms analyze big-data and generate hypotheses; in contrast, the proposed inference-analytic algorithms analyze these hypotheses and rank them by predicted biological impact. We demonstrate this through the development of an algorithm to predict biomedical impact of protein-protein interactions (PPIs) which is estimated by the number of future publications that cite the paper which originally reported the PPI. This position paper describes a new computational problem that is relevant in the era of big-data and discusses the challenges that exist in studying this problem, highlighting the need for the scientific community to engage in this line of research. The proposed class of algorithms, namely inference-analytic algorithms, is necessary to ensure that resources are invested in translating those computational outcomes that promise maximum biological impact. Application of this concept to predict biomedical impact of PPIs illustrates not only the concept, but also the challenges in designing these algorithms.
Hybrid three-dimensional and support vector machine approach for automatic vehicle tracking and classification using a single camera

NASA Astrophysics Data System (ADS)

Kachach, Redouane; Cañas, José María

2016-05-01

Using video in traffic monitoring is one of the most active research domains in the computer vision community. TrafficMonitor, a system that employs a hybrid approach for automatic vehicle tracking and classification on highways using a simple stationary calibrated camera, is presented. The proposed system consists of three modules: vehicle detection, vehicle tracking, and vehicle classification. Moving vehicles are detected by an enhanced Gaussian mixture model background estimation algorithm. The design includes a technique to resolve the occlusion problem by using a combination of two-dimensional proximity tracking algorithm and the Kanade-Lucas-Tomasi feature tracking algorithm. The last module classifies the shapes identified into five vehicle categories: motorcycle, car, van, bus, and truck by using three-dimensional templates and an algorithm based on histogram of oriented gradients and the support vector machine classifier. Several experiments have been performed using both real and simulated traffic in order to validate the system. The experiments were conducted on GRAM-RTM dataset and a proper real video dataset which is made publicly available as part of this work.
Utility of an Algorithm to Increase the Accuracy of Medication History in an Obstetrical Setting.

PubMed

Corbel, Aline; Baud, David; Chaouch, Aziz; Beney, Johnny; Csajka, Chantal; Panchaud, Alice

2016-01-01

In an obstetrical setting, inaccurate medication histories at hospital admission may result in failure to identify potentially harmful treatments for patients and/or their fetus(es). This prospective study was conducted to assess average concordance rates between (1) a medication list obtained with a one-page structured medication history algorithm developed for the obstetrical setting and (2) the medication list reported in medical records and obtained by open-ended questions based on standard procedures. Both lists were converted into concordance rate using a best possible medication history approach as the reference (information obtained by patients, prescribers and community pharmacists' interviews). The algorithm-based method obtained a higher average concordance rate than the standard method, with respectively 90.2% [CI95% 85.8-94.3] versus 24.6% [CI95%15.3-34.4] concordance rates (p<0.01). Our algorithm-based method strongly enhanced the accuracy of the medication history in our obstetric population, without using substantial resources. Its implementation is an effective first step to the medication reconciliation process, which has been recognized as a very important component of patients' drug safety.
Exploiting social influence to magnify population-level behaviour change in maternal and child health: study protocol for a randomised controlled trial of network targeting algorithms in rural Honduras.

PubMed

Shakya, Holly B; Stafford, Derek; Hughes, D Alex; Keegan, Thomas; Negron, Rennie; Broome, Jai; McKnight, Mark; Nicoll, Liza; Nelson, Jennifer; Iriarte, Emma; Ordonez, Maria; Airoldi, Edo; Fowler, James H; Christakis, Nicholas A

2017-03-13

Despite global progress on many measures of child health, rates of neonatal mortality remain high in the developing world. Evidence suggests that substantial improvements can be achieved with simple, low-cost interventions within family and community settings, particularly those designed to change knowledge and behaviour at the community level. Using social network analysis to identify structurally influential community members and then targeting them for intervention shows promise for the implementation of sustainable community-wide behaviour change. We will use a detailed understanding of social network structure and function to identify novel ways of targeting influential individuals to foster cascades of behavioural change at a population level. Our work will involve experimental and observational analyses. We will map face-to-face social networks of 30 000 people in 176 villages in Western Honduras, and then conduct a randomised controlled trial of a friendship-based network-targeting algorithm with a set of well-established care interventions. We will also test whether the proportion of the population targeted affects the degree to which the intervention spreads throughout the network. We will test scalable methods of network targeting that would not, in the future, require the actual mapping of social networks but would still offer the prospect of rapidly identifying influential targets for public health interventions. The Yale IRB and the Honduran Ministry of Health approved all data collection procedures (Protocol number 1506016012) and all participants will provide informed consent before enrolment. We will publish our findings in peer-reviewed journals as well as engage non-governmental organisations and other actors through venues for exchanging practical methods for behavioural health interventions, such as global health conferences. We will also develop a 'toolkit' for practitioners to use in network-based intervention efforts, including public release of our network mapping software. NCT02694679; Pre-results. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Barriers to community case management of malaria in Saraya, Senegal: training, and supply-chains.

PubMed

Blanas, Demetri A; Ndiaye, Youssoupha; Nichols, Kim; Jensen, Andrew; Siddiqui, Ammar; Hennig, Nils

2013-03-14

Health workers in sub-Saharan Africa can now diagnose and treat malaria in the field, using rapid diagnostic tests and artemisinin-based combination therapy in areas without microscopy and widespread resistance to previously effective drugs. This study evaluates communities' perceptions of a new community case management of malaria programme in the district of Saraya, south-eastern Senegal, the effectiveness of lay health worker trainings, and the availability of rapid diagnostic tests and artemisinin-based combination therapy in the field. The study employed qualitative and quantitative methods including focus groups with villagers, and pre- and post-training questionnaires with lay health workers. Communities approved of the community case management programme, but expressed concern about other general barriers to care, particularly transportation challenges. Most lay health workers acquired important skills, but a sizeable minority did not understand the rapid diagnostic test algorithm and were not able to correctly prescribe arteminisin-based combination therapy soon after the training. Further, few women lay health workers participated in the programme. Finally, the study identified stock-outs of rapid tests and anti-malaria medication products in over half of the programme sites two months after the start of the programme, thought due to a regional shortage. This study identified barriers to implementation of the community case management of malaria programme in Saraya that include lay health worker training, low numbers of women participants, and generalized stock-outs. These barriers warrant investigation into possible solutions of relevance to community case management generally.
Epidemic spreading on complex networks with overlapping and non-overlapping community structure

NASA Astrophysics Data System (ADS)

Shang, Jiaxing; Liu, Lianchen; Li, Xin; Xie, Feng; Wu, Cheng

2015-02-01

Many real-world networks exhibit community structure where vertices belong to one or more communities. Recent studies show that community structure plays an import role in epidemic spreading. In this paper, we investigate how the extent of overlap among communities affects epidemics. In order to experiment on the characteristic of overlapping communities, we propose a rewiring algorithm that can change the community structure from overlapping to non-overlapping while maintaining the degree distribution of the network. We simulate the Susceptible-Infected-Susceptible (SIS) epidemic process on synthetic scale-free networks and real-world networks by applying our rewiring algorithm. Experiments show that epidemics spread faster on networks with higher level of overlapping communities. Furthermore, overlapping communities' effect interacts with the average degree's effect. Our work further illustrates the important role of overlapping communities in the process of epidemic spreading.
Subtyping attention-deficit/hyperactivity disorder using temperament dimensions: toward biologically based nosologic criteria

PubMed Central

Karalunas, Sarah L.; Fair, Damien; Musser, Erica D.; Aykes, Kamari; Iyer, Swathi P.; Nigg, Joel T.

2014-01-01

Importance Psychiatric nosology is limited by behavioral and biological heterogeneity within existing disorder categories. The imprecise nature of current nosological distinctions limits both mechanistic understanding and clinical prediction. Here, we demonstrate an approach consistent with the NIMH Research Domain Criteria (RDoC) initiative to identifying superior, neurobiologically-valid subgroups with better predictive capacity than existing psychiatric categories for childhood Attention-Deficit Hyperactivity Disorder (ADHD). Objective Refine subtyping of childhood ADHD by using biologically-based behavioral dimensions (i.e. temperament), novel classification algorithms, and multiple external validators. In doing so, we demonstrate how refined nosology is capable of improving on current predictive capacity of long-term outcomes relative to current DSM-based nosology. Design, Setting, Participants 437 clinically well-characterized, community-recruited children with and without ADHD participated in an on-going longitudinal study. Baseline data were used to classify children into subgroups based on temperament dimensions and to examine external validators including physiological and MRI measures. One-year longitudinal follow-up data are reported for a subgroup of the ADHD sample to address stability and clinical prediction. Main Outcome Measures Parent/guardian ratings of children on a measure of temperament were used as input features in novel community detection analyses to identify subgroups within the sample. Groups were validated using three widely-accepted external validators: peripheral physiology (cardiac measures of respiratory sinus arrhythmia and pre-ejection period), central nervous system functioning (via resting-state functional connectivity MRI), and clinical outcomes (at one-year longitudinal follow-up). Results The community detection algorithm suggested three novel types of ADHD, labeled as “Mild” (normative emotion regulation); “Surgent” (extreme levels of positive approach-motivation); and “Irritable” (extreme levels of negative emotionality, anger, and poor soothability). Types were independent of existing clinical demarcations, including DSM-5 presentations or symptom severity. These types showed stability over time and were distinguished by unique patterns of cardiac physiological response, resting-state functional brain connectivity, and clinical outcome one year later. Conclusions and Relevance Results suggest that a biologically-informed temperament-based typology, developed with a discovery-based community detection algorithm, provided a superior description of heterogeneity in the ADHD population than any current clinical nosology. This demonstration sets the stage for more aggressive attempts at a tractable, biologically-based nosology. PMID:25006969
Detecting and analyzing research communities in longitudinal scientific networks.

PubMed

Leone Sciabolazza, Valerio; Vacca, Raffaele; Kennelly Okraku, Therese; McCarty, Christopher

2017-01-01

A growing body of evidence shows that collaborative teams and communities tend to produce the highest-impact scientific work. This paper proposes a new method to (1) Identify collaborative communities in longitudinal scientific networks, and (2) Evaluate the impact of specific research institutes, services or policies on the interdisciplinary collaboration between these communities. First, we apply community-detection algorithms to cross-sectional scientific collaboration networks and analyze different types of co-membership in the resulting subgroups over time. This analysis summarizes large amounts of longitudinal network data to extract sets of research communities whose members have consistently collaborated or shared collaborators over time. Second, we construct networks of cross-community interactions and estimate Exponential Random Graph Models to predict the formation of interdisciplinary collaborations between different communities. The method is applied to longitudinal data on publication and grant collaborations at the University of Florida. Results show that similar institutional affiliation, spatial proximity, transitivity effects, and use of the same research services predict higher degree of interdisciplinary collaboration between research communities. Our application also illustrates how the identification of research communities in longitudinal data and the analysis of cross-community network formation can be used to measure the growth of interdisciplinary team science at a research university, and to evaluate its association with research policies, services or institutes.
Detecting and analyzing research communities in longitudinal scientific networks

PubMed Central

Vacca, Raffaele; Kennelly Okraku, Therese; McCarty, Christopher

2017-01-01

A growing body of evidence shows that collaborative teams and communities tend to produce the highest-impact scientific work. This paper proposes a new method to (1) Identify collaborative communities in longitudinal scientific networks, and (2) Evaluate the impact of specific research institutes, services or policies on the interdisciplinary collaboration between these communities. First, we apply community-detection algorithms to cross-sectional scientific collaboration networks and analyze different types of co-membership in the resulting subgroups over time. This analysis summarizes large amounts of longitudinal network data to extract sets of research communities whose members have consistently collaborated or shared collaborators over time. Second, we construct networks of cross-community interactions and estimate Exponential Random Graph Models to predict the formation of interdisciplinary collaborations between different communities. The method is applied to longitudinal data on publication and grant collaborations at the University of Florida. Results show that similar institutional affiliation, spatial proximity, transitivity effects, and use of the same research services predict higher degree of interdisciplinary collaboration between research communities. Our application also illustrates how the identification of research communities in longitudinal data and the analysis of cross-community network formation can be used to measure the growth of interdisciplinary team science at a research university, and to evaluate its association with research policies, services or institutes. PMID:28797047
Community structure from spectral properties in complex networks

NASA Astrophysics Data System (ADS)

Servedio, V. D. P.; Colaiori, F.; Capocci, A.; Caldarelli, G.

2005-06-01

We analyze the spectral properties of complex networks focusing on their relation to the community structure, and develop an algorithm based on correlations among components of different eigenvectors. The algorithm applies to general weighted networks, and, in a suitably modified version, to the case of directed networks. Our method allows to correctly detect communities in sharply partitioned graphs, however it is useful to the analysis of more complex networks, without a well defined cluster structure, as social and information networks. As an example, we test the algorithm on a large scale data-set from a psychological experiment of free word association, where it proves to be successful both in clustering words, and in uncovering mental association patterns.
REVIEW OF THE GOVERNING EQUATIONS, COMPUTATIONAL ALGORITHMS, AND OTHER COMPONENTS OF THE MODELS-3 COMMUNITY MULTISCALE AIR QUALITY (CMAQ) MODELING SYSTEM

EPA Science Inventory

This article describes the governing equations, computational algorithms, and other components entering into the Community Multiscale Air Quality (CMAQ) modeling system. This system has been designed to approach air quality as a whole by including state-of-the-science capabiliti...
The ground truth about metadata and community detection in networks.

PubMed

Peel, Leto; Larremore, Daniel B; Clauset, Aaron

2017-05-01

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Costs per Diagnosis of Acute HIV Infection in Community-based Screening Strategies: A Comparative Analysis of Four Screening Algorithms

PubMed Central

Hoenigl, Martin; Graff-Zivin, Joshua; Little, Susan J.

2016-01-01

Background. In nonhealthcare settings, widespread screening for acute human immunodeficiency virus (HIV) infection (AHI) is limited by cost and decision algorithms to better prioritize use of resources. Comparative cost analyses for available strategies are lacking. Methods. To determine cost-effectiveness of community-based testing strategies, we evaluated annual costs of 3 algorithms that detect AHI based on HIV nucleic acid amplification testing (EarlyTest algorithm) or on HIV p24 antigen (Ag) detection via Architect (Architect algorithm) or Determine (Determine algorithm) as well as 1 algorithm that relies on HIV antibody testing alone (Antibody algorithm). The cost model used data on men who have sex with men (MSM) undergoing community-based AHI screening in San Diego, California. Incremental cost-effectiveness ratios (ICERs) per diagnosis of AHI were calculated for programs with HIV prevalence rates between 0.1% and 2.9%. Results. Among MSM in San Diego, EarlyTest was cost-savings (ie, ICERs per AHI diagnosis less than $13.000) when compared with the 3 other algorithms. Cost analyses relative to regional HIV prevalence showed that EarlyTest was cost-effective (ie, ICERs less than $69.547) for similar populations of MSM with an HIV prevalence rate >0.4%; Architect was the second best alternative for HIV prevalence rates >0.6%. Conclusions. Identification of AHI by the dual EarlyTest screening algorithm is likely to be cost-effective not only among at-risk MSM in San Diego but also among similar populations of MSM with HIV prevalence rates >0.4%. PMID:26508512
The UCLan SDO Data Hub

NASA Astrophysics Data System (ADS)

Dalla, S.; Walsh, R. W.; Chapman, S. A.; Marsh, M.; Regnier, S.; Bewsher, D.; Brown, D. S.; Kelly, J.; Laitinen, T.; Alexander, C.

2010-12-01

A data pipeline for the distribution of SDO data products has been developed throughout a number of countries in the US, Europe and Asia. The UK node within this pipeline is at the University of Central Lancashire (UCLan), where a data center has been established to host a rolling AIA and HMI archive, aimed at supplying data to the country's large solar scientific community. This presentation will describe the hardware and software structures of the archive, as well as the best practice identified and feedback received from users of the facility. We will also discuss algorithms that are run locally in order to identify solar features and events.
Scalable detection of statistically significant communities and hierarchies, using message passing for modularity

PubMed Central

Zhang, Pan; Moore, Cristopher

2014-01-01

Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ‘‘communities’’ in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods. PMID:25489096
The Texas Children's Medication Algorithm Project: Revision of the Algorithm for Pharmacotherapy of Attention-Deficit/Hyperactivity Disorder

ERIC Educational Resources Information Center

Pliszka, Steven R.; Crismon, M. Lynn; Hughes, Carroll W.; Corners, C. Keith; Emslie, Graham J.; Jensen, Peter S.; McCracken, James T.; Swanson, James M.; Lopez, Molly

2006-01-01

Objective: In 1998, the Texas Department of Mental Health and Mental Retardation developed algorithms for medication treatment of attention-deficit/hyperactivity disorder (ADHD). Advances in the psychopharmacology of ADHD and results of a feasibility study of algorithm use in community mental health centers caused the algorithm to be modified and…
WE-F-201-00: Practical Guidelines for Commissioning Advanced Brachytherapy Dose Calculation Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

2015-06-15

With the recent introduction of heterogeneity correction algorithms for brachytherapy, the AAPM community is still unclear on how to commission and implement these into clinical practice. The recently-published AAPM TG-186 report discusses important issues for clinical implementation of these algorithms. A charge of the AAPM-ESTRO-ABG Working Group on MBDCA in Brachytherapy (WGMBDCA) is the development of a set of well-defined test case plans, available as references in the software commissioning process to be performed by clinical end-users. In this practical medical physics course, specific examples on how to perform the commissioning process are presented, as well as descriptions of themore » clinical impact from recent literature reporting comparisons of TG-43 and heterogeneity-based dosimetry. Learning Objectives: Identify key clinical applications needing advanced dose calculation in brachytherapy. Review TG-186 and WGMBDCA guidelines, commission process, and dosimetry benchmarks. Evaluate clinical cases using commercially available systems and compare to TG-43 dosimetry.« less
Effectiveness and safety of procalcitonin-guided antibiotic therapy in lower respiratory tract infections in "real life": an international, multicenter poststudy survey (ProREAL).

PubMed

Albrich, Werner C; Dusemund, Frank; Bucher, Birgit; Meyer, Stefan; Thomann, Robert; Kühn, Felix; Bassetti, Stefano; Sprenger, Martin; Bachli, Esther; Sigrist, Thomas; Schwietert, Martin; Amin, Devendra; Hausfater, Pierre; Carre, Eric; Gaillat, Jacques; Schuetz, Philipp; Regez, Katharina; Bossart, Rita; Schild, Ursula; Mueller, Beat

2012-05-14

In controlled studies, procalcitonin (PCT) has safely and effectively reduced antibiotic drug use for lower respiratory tract infections (LRTIs). However, controlled trial data may not reflect real life. We performed an observational quality surveillance in 14 centers in Switzerland, France, and the United States. Consecutive adults with LRTI presenting to emergency departments or outpatient offices were enrolled and registered on a website, which provided a previously published PCT algorithm for antibiotic guidance. The primary end point was duration of antibiotic therapy within 30 days. Of 1759 patients, 86.4% had a final diagnosis of LRTI (community-acquired pneumonia, 53.7%; acute exacerbation of chronic obstructive pulmonary disease, 17.1%; and bronchitis, 14.4%). Algorithm compliance overall was 68.2%, with differences between diagnoses (bronchitis, 81.0%; AECOPD, 70.1%; and community-acquired pneumonia, 63.7%; P < .001), outpatients (86.1%) and inpatients (65.9%) (P < .001), algorithm-experienced (82.5%) and algorithm-naive (60.1%) centers (P < .001), and countries (Switzerland, 75.8%; France, 73.5%; and the United States, 33.5%; P < .001). After multivariate adjustment, antibiotic therapy duration was significantly shorter if the PCT algorithm was followed compared with when it was overruled (5.9 vs 7.4 days; difference, -1.51 days; 95% CI, -2.04 to -0.98; P < .001). No increase was noted in the risk of the combined adverse outcome end point within 30 days of follow-up when the PCT algorithm was followed regarding withholding antibiotics on hospital admission (adjusted odds ratio, 0.83; 95% CI, 0.44 to 1.55; P = .56) and regarding early cessation of antibiotics (adjusted odds ratio, 0.61; 95% CI, 0.36 to 1.04; P = .07). This study validates previous results from controlled trials in real-life conditions and demonstrates that following a PCT algorithm effectively reduces antibiotic use without increasing the risk of complications. Preexisting differences in antibiotic prescribing affect compliance with antibiotic stewardship efforts. isrctn.org Identifier: ISRCTN40854211.
A new hierarchical method to find community structure in networks

NASA Astrophysics Data System (ADS)

Saoud, Bilal; Moussaoui, Abdelouahab

2018-04-01

Community structure is very important to understand a network which represents a context. Many community detection methods have been proposed like hierarchical methods. In our study, we propose a new hierarchical method for community detection in networks based on genetic algorithm. In this method we use genetic algorithm to split a network into two networks which maximize the modularity. Each new network represents a cluster (community). Then we repeat the splitting process until we get one node at each cluster. We use the modularity function to measure the strength of the community structure found by our method, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our method are highly effective at discovering community structure in both computer-generated and real-world network data.

Compiler Optimization Pass Visualization: The Procedural Abstraction Case

ERIC Educational Resources Information Center

Schaeckeler, Stefan; Shang, Weijia; Davis, Ruth

2009-01-01

There is an active research community concentrating on visualizations of algorithms taught in CS1 and CS2 courses. These visualizations can help students to create concrete visual images of the algorithms and their underlying concepts. Not only "fundamental algorithms" can be visualized, but also algorithms used in compilers. Visualizations that…
A Hybrid Remote Sensing Approach for Detecting the Florida Red Tide

NASA Astrophysics Data System (ADS)

Carvalho, G. A.; Minnett, P. J.; Banzon, V.; Baringer, W.

2008-12-01

Harmful algal blooms (HABs) have caused major worldwide economic losses commonly linked with health problems for humans and wildlife. In the Eastern Gulf of Mexico the toxic marine dinoflagellate Karenia brevis is responsible for nearly annual, massive red tides causing fish kills, shellfish poisoning, and acute respiratory irritation in humans: the so-called Florida Red Tide. Near real-time satellite measurements could be an effective method for identifying HABs. The use of space-borne data would be a highly desired, low-cost technique offering the remote and accurate detection of K. brevis blooms over the West Florida Shelf, bringing tremendous societal benefits to the general public, scientific community, resource managers and medical health practitioners. An extensive in situ database provided by the Florida Fish and Wildlife Conservation Commission's Research Institute was used to examine the long-term accuracy of two satellite- based algorithms at detecting the Florida Red Tide. Using MODIS data from 2002 to 2006, the two algorithms are optimized and their accuracy assessed. It has been found that the sequential application of the algorithms results in improved predictability characteristics, correctly identifying ~80% of the cases (for both sensitivity and specificity, as well as overall accuracy), and exhibiting strong positive (70%) and negative (86%) predictive values.
Detection of gene communities in multi-networks reveals cancer drivers

NASA Astrophysics Data System (ADS)

Cantini, Laura; Medico, Enzo; Fortunato, Santo; Caselle, Michele

2015-12-01

We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein-protein interactions require a tight coregulation of the partners and that such a fine tuned regulation can be obtained only combining both the transcriptional and post-transcriptional layers of regulation. To extract the relevant biological information from the multi-network we studied its partition into communities. To this end we applied a consensus clustering algorithm based on state of art community detection methods. Even if our procedure is valid in principle for any pathology in this work we concentrate on gastric, lung, pancreas and colorectal cancer and identified from the enrichment analysis of the multi-network communities a set of candidate driver cancer genes. Some of them were already known oncogenes while a few are new. The combination of the different layers of information allowed us to extract from the multi-network indications on the regulatory pattern and functional role of both the already known and the new candidate driver genes.
Seeding for pervasively overlapping communities

NASA Astrophysics Data System (ADS)

Lee, Conrad; Reid, Fergal; McDaid, Aaron; Hurley, Neil

2011-06-01

In some social and biological networks, the majority of nodes belong to multiple communities. It has recently been shown that a number of the algorithms specifically designed to detect overlapping communities do not perform well in such highly overlapping settings. Here, we consider one class of these algorithms, those which optimize a local fitness measure, typically by using a greedy heuristic to expand a seed into a community. We perform synthetic benchmarks which indicate that an appropriate seeding strategy becomes more important as the extent of community overlap increases. We find that distinct cliques provide the best seeds. We find further support for this seeding strategy with benchmarks on a Facebook network and the yeast interactome.
Think locally, act locally: detection of small, medium-sized, and large communities in large networks.

PubMed

Jeub, Lucas G S; Balachandran, Prakash; Porter, Mason A; Mucha, Peter J; Mahoney, Michael W

2015-01-01

It is common in the study of networks to investigate intermediate-sized (or "meso-scale") features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify "communities," which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that communities are associated with bottlenecks of locally biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for "size-resolved community structure" that can arise in real (and realistic) networks: (1) the best small groups of nodes can be better than the best large groups (for a given formulation of the idea of a good community); (2) the best small groups can have a quality that is comparable to the best medium-sized and large groups; and (3) the best small groups of nodes can be worse than the best large groups. As we discuss in detail, which of these three cases holds for a given network can make an enormous difference when investigating and making claims about network community structure, and it is important to take this into account to obtain reliable downstream conclusions. Depending on which scenario holds, one may or may not be able to successfully identify "good" communities in a given network (and good communities might not even exist for a given community quality measure), the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics. In addition, our results suggest that, for many large realistic networks, the output of locally biased methods that focus on communities that are centered around a given seed node (or set of seed nodes) might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate structural properties that are important to consider in the development of better benchmark networks to test methods for community detection.
Think locally, act locally: Detection of small, medium-sized, and large communities in large networks

NASA Astrophysics Data System (ADS)

Jeub, Lucas G. S.; Balachandran, Prakash; Porter, Mason A.; Mucha, Peter J.; Mahoney, Michael W.

2015-01-01

It is common in the study of networks to investigate intermediate-sized (or "meso-scale") features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify "communities," which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that communities are associated with bottlenecks of locally biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for "size-resolved community structure" that can arise in real (and realistic) networks: (1) the best small groups of nodes can be better than the best large groups (for a given formulation of the idea of a good community); (2) the best small groups can have a quality that is comparable to the best medium-sized and large groups; and (3) the best small groups of nodes can be worse than the best large groups. As we discuss in detail, which of these three cases holds for a given network can make an enormous difference when investigating and making claims about network community structure, and it is important to take this into account to obtain reliable downstream conclusions. Depending on which scenario holds, one may or may not be able to successfully identify "good" communities in a given network (and good communities might not even exist for a given community quality measure), the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics. In addition, our results suggest that, for many large realistic networks, the output of locally biased methods that focus on communities that are centered around a given seed node (or set of seed nodes) might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate structural properties that are important to consider in the development of better benchmark networks to test methods for community detection.
Detecting communities in large networks

NASA Astrophysics Data System (ADS)

Capocci, A.; Servedio, V. D. P.; Caldarelli, G.; Colaiori, F.

2005-07-01

We develop an algorithm to detect community structure in complex networks. The algorithm is based on spectral methods and takes into account weights and link orientation. Since the method detects efficiently clustered nodes in large networks even when these are not sharply partitioned, it turns to be specially suitable for the analysis of social and information networks. We test the algorithm on a large-scale data-set from a psychological experiment of word association. In this case, it proves to be successful both in clustering words, and in uncovering mental association patterns.
Housing Instability Among Current and Former Welfare Recipients

PubMed Central

Phinney, Robin; Danziger, Sheldon; Pollack, Harold A.; Seefeldt, Kristin

2007-01-01

Objectives. We examined correlates of eviction and homelessness among current and former welfare recipients from 1997 to 2003 in an urban Michigan community. Methods. Longitudinal cohort data were drawn from the Women’s Employment Study, a representative panel study of mothers who were receiving cash welfare in February 1997. We used logistic regression analysis to identify risk factors for both eviction and homelessness over the survey period. Results. Twenty percent (95% confidence interval [CI]=16%, 23%) of respondents were evicted and 12% (95% CI=10%, 15%) experienced homelessness at least once between fall 1997 and fall 2003. Multivariate analyses indicated 2 consistent risk factors: having less than a high school education and having used illicit drugs other than marijuana. Mental and physical health problems were significantly associated with homelessness but not evictions. A multivariate screening algorithm achieved 75% sensitivity and 67% specificity in identifying individuals at risk for homelessness. A corresponding algorithm for eviction achieved 75% sensitivity and 50% specificity. Conclusions. The high prevalence of housing instability among our respondents suggests the need to better target housing assistance and other social services to current and former welfare recipients with identifiable personal problems. PMID:17267717
Costs per Diagnosis of Acute HIV Infection in Community-based Screening Strategies: A Comparative Analysis of Four Screening Algorithms.

PubMed

Hoenigl, Martin; Graff-Zivin, Joshua; Little, Susan J

2016-02-15

In nonhealthcare settings, widespread screening for acute human immunodeficiency virus (HIV) infection (AHI) is limited by cost and decision algorithms to better prioritize use of resources. Comparative cost analyses for available strategies are lacking. To determine cost-effectiveness of community-based testing strategies, we evaluated annual costs of 3 algorithms that detect AHI based on HIV nucleic acid amplification testing (EarlyTest algorithm) or on HIV p24 antigen (Ag) detection via Architect (Architect algorithm) or Determine (Determine algorithm) as well as 1 algorithm that relies on HIV antibody testing alone (Antibody algorithm). The cost model used data on men who have sex with men (MSM) undergoing community-based AHI screening in San Diego, California. Incremental cost-effectiveness ratios (ICERs) per diagnosis of AHI were calculated for programs with HIV prevalence rates between 0.1% and 2.9%. Among MSM in San Diego, EarlyTest was cost-savings (ie, ICERs per AHI diagnosis less than $13.000) when compared with the 3 other algorithms. Cost analyses relative to regional HIV prevalence showed that EarlyTest was cost-effective (ie, ICERs less than $69.547) for similar populations of MSM with an HIV prevalence rate >0.4%; Architect was the second best alternative for HIV prevalence rates >0.6%. Identification of AHI by the dual EarlyTest screening algorithm is likely to be cost-effective not only among at-risk MSM in San Diego but also among similar populations of MSM with HIV prevalence rates >0.4%. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.
The ground truth about metadata and community detection in networks

PubMed Central

Peel, Leto; Larremore, Daniel B.; Clauset, Aaron

2017-01-01

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. PMID:28508065
Exploring anti-community structure in networks with application to incompatibility of traditional Chinese medicine

NASA Astrophysics Data System (ADS)

Zhu, Jiajing; Liu, Yongguo; Zhang, Yun; Liu, Xiaofeng; Xiao, Yonghua; Wang, Shidong; Wu, Xindong

2017-11-01

Community structure is one of the most important properties in networks, in which a node shares its most connections with the others in the same community. On the contrary, the anti-community structure means the nodes in the same group have few or no connections with each other. In Traditional Chinese Medicine (TCM), the incompatibility problem of herbs is a challenge to the clinical medication safety. In this paper, we propose a new anti-community detection algorithm, Random non-nEighboring nOde expansioN (REON), to find anti-communities in networks, in which a new evaluation criterion, anti-modularity, is designed to measure the quality of the obtained anti-community structure. In order to establish anti-communities in REON, we expand the node set by non-neighboring node expansion and regard the node set with the highest anti-modularity as an anti-community. Inspired by the phenomenon that the node with higher degree has greater contribution to the anti-modularity, an improved algorithm called REONI is developed by expanding node set by the non-neighboring node with the maximum degree, which greatly enhances the efficiency of REON. Experiments on synthetic and real-world networks demonstrate the superiority of the proposed algorithms over the existing methods. In addition, by applying REONI to the herb network, we find that it can discover incompatible herb combinations.
Elucidation of Seventeen Human Peripheral Blood B cell Subsets and Quantification of the Tetanus Response Using a Density-Based Method for the Automated Identification of Cell Populations in Multidimensional Flow Cytometry Data

PubMed Central

Qian, Yu; Wei, Chungwen; Lee, F. Eun-Hyung; Campbell, John; Halliley, Jessica; Lee, Jamie A.; Cai, Jennifer; Kong, Megan; Sadat, Eva; Thomson, Elizabeth; Dunn, Patrick; Seegmiller, Adam C.; Karandikar, Nitin J.; Tipton, Chris; Mosmann, Tim; Sanz, Iñaki; Scheuermann, Richard H.

2011-01-01

Background Advances in multi-parameter flow cytometry (FCM) now allow for the independent detection of larger numbers of fluorochromes on individual cells, generating data with increasingly higher dimensionality. The increased complexity of these data has made it difficult to identify cell populations from high-dimensional FCM data using traditional manual gating strategies based on single-color or two-color displays. Methods To address this challenge, we developed a novel program, FLOCK (FLOw Clustering without K), that uses a density-based clustering approach to algorithmically identify biologically relevant cell populations from multiple samples in an unbiased fashion, thereby eliminating operator-dependent variability. Results FLOCK was used to objectively identify seventeen distinct B cell subsets in a human peripheral blood sample and to identify and quantify novel plasmablast subsets responding transiently to tetanus and other vaccinations in peripheral blood. FLOCK has been implemented in the publically available Immunology Database and Analysis Portal – ImmPort (http://www.immport.org) for open use by the immunology research community. Conclusions FLOCK is able to identify cell subsets in experiments that use multi-parameter flow cytometry through an objective, automated computational approach. The use of algorithms like FLOCK for FCM data analysis obviates the need for subjective and labor intensive manual gating to identify and quantify cell subsets. Novel populations identified by these computational approaches can serve as hypotheses for further experimental study. PMID:20839340
Think Locally, Act Locally: The Detection of Small, Medium-Sized, and Large Communities in Large Networks

PubMed Central

Jeub, Lucas G. S.; Balachandran, Prakash; Porter, Mason A.; Mucha, Peter J.; Mahoney, Michael W.

2016-01-01

It is common in the study of networks to investigate intermediate-sized (or “meso-scale”) features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify “communities,” which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that “communities” are associated with bottlenecks of locally-biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for “size-resolved community structure” that can arise in real (and realistic) networks: (i) the best small groups of nodes can be better than the best large groups (for a given formulation of the idea of a good community); (ii) the best small groups can have a quality that is comparable to the best medium-sized and large groups; and (iii) the best small groups of nodes can be worse than the best large groups. As we discuss in detail, which of these three cases holds for a given network can make an enormous difference when investigating and making claims about network community structure, and it is important to take this into account to obtain reliable downstream conclusions. Depending on which scenario holds, one may or may not be able to successfully identify “good” communities in a given network (and good communities might not even exist for a given community quality measure), the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics. In addition, our results suggest that, for many large realistic networks, the output of locally-biased methods that focus on communities that are centered around a given seed node might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate subtler structural properties that are important to consider in the development of better benchmark networks to test methods for community detection. PMID:25679670
Enhancing Community Detection By Affinity-based Edge Weighting Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, Andy; Sanders, Geoffrey; Henson, Van

Community detection refers to an important graph analytics problem of finding a set of densely-connected subgraphs in a graph and has gained a great deal of interest recently. The performance of current community detection algorithms is limited by an inherent constraint of unweighted graphs that offer very little information on their internal community structures. In this paper, we propose a new scheme to address this issue that weights the edges in a given graph based on recently proposed vertex affinity. The vertex affinity quantifies the proximity between two vertices in terms of their clustering strength, and therefore, it is idealmore » for graph analytics applications such as community detection. We also demonstrate that the affinity-based edge weighting scheme can improve the performance of community detection algorithms significantly.« less
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

PubMed

Goldstein, Markus; Uchida, Seiichi

2016-01-01

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.
CommWalker: correctly evaluating modules in molecular networks in light of annotation bias.

PubMed

Luecken, M D; Page, M J T; Crosby, A J; Mason, S; Reinert, G; Deane, C M

2018-03-15

Detecting novel functional modules in molecular networks is an important step in biological research. In the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as we show, the uneven distribution of functional annotations means that such evaluation methods favor communities of well-studied proteins. We propose a novel framework for the evaluation of communities as functional modules. Our proposed framework, CommWalker, takes communities as inputs and evaluates them in their local network environment by performing short random walks. We test CommWalker's ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks. We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods. Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation. CommWalker community prioritization both faithfully captures well-validated communities and identifies functional modules that may correspond to more novel biology. The CommWalker algorithm is freely available at opig.stats.ox.ac.uk/resources or as a docker image on the Docker Hub at hub.docker.com/r/lueckenmd/commwalker/. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Evidence of community structure in biomedical research grant collaborations.

PubMed

Nagarajan, Radhakrishnan; Kalinka, Alex T; Hogan, William R

2013-02-01

Recent studies have clearly demonstrated a shift towards collaborative research and team science approaches across a spectrum of disciplines. Such collaborative efforts have also been acknowledged and nurtured by popular extramurally funded programs including the Clinical Translational Science Award (CTSA) conferred by the National Institutes of Health. Since its inception, the number of CTSA awardees has steadily increased to 60 institutes across 30 states. One of the objectives of CTSA is to accelerate translation of research from bench to bedside to community and train a new genre of researchers under the translational research umbrella. Feasibility of such a translation implicitly demands multi-disciplinary collaboration and mentoring. Networks have proven to be convenient abstractions for studying research collaborations. The present study is a part of the CTSA baseline study and investigates existence of possible community-structure in Biomedical Research Grant Collaboration (BRGC) networks across data sets retrieved from the internally developed grants management system, the Automated Research Information Administrator (ARIA) at the University of Arkansas for Medical Sciences (UAMS). Fastgreedy and link-community community-structure detection algorithms were used to investigate the presence of non-overlapping and overlapping community-structure and their variation across years 2006 and 2009. A surrogate testing approach in conjunction with appropriate discriminant statistics, namely: the modularity index and the maximum partition density is proposed to investigate whether the community-structure of the BRGC networks were different from those generated by certain types of random graphs. Non-overlapping as well as overlapping community-structure detection algorithms indicated the presence of community-structure in the BRGC network. Subsequent, surrogate testing revealed that random graph models considered in the present study may not necessarily be appropriate generative mechanisms of the community-structure in the BRGC networks. The discrepancy in the community-structure between the BRGC networks and the random graph surrogates was especially pronounced at 2009 as opposed to 2006 indicating a possible shift towards team-science and formation of non-trivial modular patterns with time. The results also clearly demonstrate presence of inter-departmental and multi-disciplinary collaborations in BRGC networks. While the results are presented on BRGC networks as a part of the CTSA baseline study at UAMS, the proposed methodologies are as such generic with potential to be extended across other CTSA organizations. Understanding the presence of community-structure can supplement more traditional network analysis as they're useful in identifying research teams and their inter-connections as opposed to the role of individual nodes in the network. Such an understanding can be a critical step prior to devising meaningful interventions for promoting team-science, multi-disciplinary collaborations, cross-fertilization of ideas across research teams and identifying suitable mentors. Understanding the temporal evolution of these communities may also be useful in CTSA evaluation. Copyright © 2012. Published by Elsevier Inc.
Time-saving impact of an algorithm to identify potential surgical site infections.

PubMed

Knepper, B C; Young, H; Jenkins, T C; Price, C S

2013-10-01

To develop and validate a partially automated algorithm to identify surgical site infections (SSIs) using commonly available electronic data to reduce manual chart review. Retrospective cohort study of patients undergoing specific surgical procedures over a 4-year period from 2007 through 2010 (algorithm development cohort) or over a 3-month period from January 2011 through March 2011 (algorithm validation cohort). A single academic safety-net hospital in a major metropolitan area. Patients undergoing at least 1 included surgical procedure during the study period. Procedures were identified in the National Healthcare Safety Network; SSIs were identified by manual chart review. Commonly available electronic data, including microbiologic, laboratory, and administrative data, were identified via a clinical data warehouse. Algorithms using combinations of these electronic variables were constructed and assessed for their ability to identify SSIs and reduce chart review. The most efficient algorithm identified in the development cohort combined microbiologic data with postoperative procedure and diagnosis codes. This algorithm resulted in 100% sensitivity and 85% specificity. Time savings from the algorithm was almost 600 person-hours of chart review. The algorithm demonstrated similar sensitivity on application to the validation cohort. A partially automated algorithm to identify potential SSIs was highly sensitive and dramatically reduced the amount of manual chart review required of infection control personnel during SSI surveillance.
Community detection in complex networks using proximate support vector clustering

NASA Astrophysics Data System (ADS)

Wang, Feifan; Zhang, Baihai; Chai, Senchun; Xia, Yuanqing

2018-03-01

Community structure, one of the most attention attracting properties in complex networks, has been a cornerstone in advances of various scientific branches. A number of tools have been involved in recent studies concentrating on the community detection algorithms. In this paper, we propose a support vector clustering method based on a proximity graph, owing to which the introduced algorithm surpasses the traditional support vector approach both in accuracy and complexity. Results of extensive experiments undertaken on computer generated networks and real world data sets illustrate competent performances in comparison with the other counterparts.
Traveling salesman problems with PageRank Distance on complex networks reveal community structure

NASA Astrophysics Data System (ADS)

Jiang, Zhongzhou; Liu, Jing; Wang, Shuai

2016-12-01

In this paper, we propose a new algorithm for community detection problems (CDPs) based on traveling salesman problems (TSPs), labeled as TSP-CDA. Since TSPs need to find a tour with minimum cost, cities close to each other are usually clustered in the tour. This inspired us to model CDPs as TSPs by taking each vertex as a city. Then, in the final tour, the vertices in the same community tend to cluster together, and the community structure can be obtained by cutting the tour into a couple of paths. There are two challenges. The first is to define a suitable distance between each pair of vertices which can reflect the probability that they belong to the same community. The second is to design a suitable strategy to cut the final tour into paths which can form communities. In TSP-CDA, we deal with these two challenges by defining a PageRank Distance and an automatic threshold-based cutting strategy. The PageRank Distance is designed with the intrinsic properties of CDPs in mind, and can be calculated efficiently. In the experiments, benchmark networks with 1000-10,000 nodes and varying structures are used to test the performance of TSP-CDA. A comparison is also made between TSP-CDA and two well-established community detection algorithms. The results show that TSP-CDA can find accurate community structure efficiently and outperforms the two existing algorithms.

Community Detection on the GPU

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive experiments show that we obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order ofmore » magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.« less
Homophyly/kinship hypothesis: Natural communities, and predicting in networks

NASA Astrophysics Data System (ADS)

Li, Angsheng; Li, Jiankou; Pan, Yicheng

2015-02-01

It has been a longstanding challenge to understand natural communities in real world networks. We proposed a community finding algorithm based on fitness of networks, two algorithms for prediction, accurate prediction and confirmation of keywords for papers in the citation network Arxiv HEP-TH (high energy physics theory), and the measures of internal centrality, external de-centrality, internal and external slopes to characterize the structures of communities. We implemented our algorithms on 2 citation and 5 cooperation graphs. Our experiments explored and validated a homophyly/kinship principle of real world networks. The homophyly/kinship principle includes: (1) homophyly is the natural selection in real world networks, similar to Darwin's kinship selection in nature, (2) real world networks consist of natural communities generated by the natural selection of homophyly, (3) most individuals in a natural community share a short list of common attributes, (4) natural communities have an internal centrality (or internal heterogeneity) that a natural community has a few nodes dominating most of the individuals in the community, (5) natural communities have an external de-centrality (or external homogeneity) that external links of a natural community homogeneously distributed in different communities, and (6) natural communities of a given network have typical structures determined by the internal slopes, and have typical patterns of outgoing links determined by external slopes, etc. Our homophyly/kinship principle perfectly matches Darwin's observation that animals from ants to people form social groups in which most individuals work for the common good, and that kinship could encourage altruistic behavior. Our homophyly/kinship principle is the network version of Darwinian theory, and builds a bridge between Darwinian evolution and network science.
A Performance Evaluation of Lightning-NO Algorithms in CMAQ

EPA Science Inventory

In the Community Multiscale Air Quality (CMAQv5.2) model, we have implemented two algorithms for lightning NO production; one algorithm is based on the hourly observed cloud-to-ground lightning strike data from National Lightning Detection Network (NLDN) to replace the previous m...
An algorithmic and information-theoretic approach to multimetric index construction

USGS Publications Warehouse

Schoolmaster, Donald R.; Grace, James B.; Schweiger, E. William; Guntenspergen, Glenn R.; Mitchell, Brian R.; Miller, Kathryn M.; Little, Amanda M.

2013-01-01

The use of multimetric indices (MMIs), such as the widely used index of biological integrity (IBI), to measure, track, summarize and infer the overall impact of human disturbance on biological communities has been steadily growing in recent years. Initially, MMIs were developed for aquatic communities using pre-selected biological metrics as indicators of system integrity. As interest in these bioassessment tools has grown, so have the types of biological systems to which they are applied. For many ecosystem types the appropriate biological metrics to use as measures of biological integrity are not known a priori. As a result, a variety of ad hoc protocols for selecting metrics empirically has developed. However, the assumptions made by proposed protocols have not be explicitly described or justified, causing many investigators to call for a clear, repeatable methodology for developing empirically derived metrics and indices that can be applied to any biological system. An issue of particular importance that has not been sufficiently addressed is the way that individual metrics combine to produce an MMI that is a sensitive composite indicator of human disturbance. In this paper, we present and demonstrate an algorithm for constructing MMIs given a set of candidate metrics and a measure of human disturbance. The algorithm uses each metric to inform a candidate MMI, and then uses information-theoretic principles to select MMIs that capture the information in the multidimensional system response from among possible MMIs. Such an approach can be used to create purely empirical (data-based) MMIs or can, optionally, be influenced by expert opinion or biological theory through the use of a weighting vector to create value-weighted MMIs. We demonstrate the algorithm with simulated data to demonstrate the predictive capacity of the final MMIs and with real data from wetlands from Acadia and Rocky Mountain National Parks. For the Acadia wetland data, the algorithm identified 4 metrics that combined to produce a -0.88 correlation with the human disturbance index. When compared to other methods, we find this algorithmic approach resulted in MMIs that were more predictive and comprise fewer metrics.
A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

PubMed Central

Tang, Haixu; Li, Sujun; Ye, Yuzhen

2016-01-01

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579
Combining a popularity-productivity stochastic block model with a discriminative-content model for general structure detection.

PubMed

Chai, Bian-fang; Yu, Jian; Jia, Cai-Yan; Yang, Tian-bao; Jiang, Ya-wen

2013-07-01

Latent community discovery that combines links and contents of a text-associated network has drawn more attention with the advance of social media. Most of the previous studies aim at detecting densely connected communities and are not able to identify general structures, e.g., bipartite structure. Several variants based on the stochastic block model are more flexible for exploring general structures by introducing link probabilities between communities. However, these variants cannot identify the degree distributions of real networks due to a lack of modeling of the differences among nodes, and they are not suitable for discovering communities in text-associated networks because they ignore the contents of nodes. In this paper, we propose a popularity-productivity stochastic block (PPSB) model by introducing two random variables, popularity and productivity, to model the differences among nodes in receiving links and producing links, respectively. This model has the flexibility of existing stochastic block models in discovering general community structures and inherits the richness of previous models that also exploit popularity and productivity in modeling the real scale-free networks with power law degree distributions. To incorporate the contents in text-associated networks, we propose a combined model which combines the PPSB model with a discriminative model that models the community memberships of nodes by their contents. We then develop expectation-maximization (EM) algorithms to infer the parameters in the two models. Experiments on synthetic and real networks have demonstrated that the proposed models can yield better performances than previous models, especially on networks with general structures.
BridgeRank: A novel fast centrality measure based on local structure of the network

NASA Astrophysics Data System (ADS)

Salavati, Chiman; Abdollahpouri, Alireza; Manbari, Zhaleh

2018-04-01

Ranking nodes in complex networks have become an important task in many application domains. In a complex network, influential nodes are those that have the most spreading ability. Thus, identifying influential nodes based on their spreading ability is a fundamental task in different applications such as viral marketing. One of the most important centrality measures to ranking nodes is closeness centrality which is efficient but suffers from high computational complexity O(n3) . This paper tries to improve closeness centrality by utilizing the local structure of nodes and presents a new ranking algorithm, called BridgeRank centrality. The proposed method computes local centrality value for each node. For this purpose, at first, communities are detected and the relationship between communities is completely ignored. Then, by applying a centrality in each community, only one best critical node from each community is extracted. Finally, the nodes are ranked based on computing the sum of the shortest path length of nodes to obtained critical nodes. We have also modified the proposed method by weighting the original BridgeRank and selecting several nodes from each community based on the density of that community. Our method can find the best nodes with high spread ability and low time complexity, which make it applicable to large-scale networks. To evaluate the performance of the proposed method, we use the SIR diffusion model. Finally, experiments on real and artificial networks show that our method is able to identify influential nodes so efficiently, and achieves better performance compared to other recent methods.
Combining a popularity-productivity stochastic block model with a discriminative-content model for general structure detection

NASA Astrophysics Data System (ADS)

Chai, Bian-fang; Yu, Jian; Jia, Cai-yan; Yang, Tian-bao; Jiang, Ya-wen

2013-07-01

Latent community discovery that combines links and contents of a text-associated network has drawn more attention with the advance of social media. Most of the previous studies aim at detecting densely connected communities and are not able to identify general structures, e.g., bipartite structure. Several variants based on the stochastic block model are more flexible for exploring general structures by introducing link probabilities between communities. However, these variants cannot identify the degree distributions of real networks due to a lack of modeling of the differences among nodes, and they are not suitable for discovering communities in text-associated networks because they ignore the contents of nodes. In this paper, we propose a popularity-productivity stochastic block (PPSB) model by introducing two random variables, popularity and productivity, to model the differences among nodes in receiving links and producing links, respectively. This model has the flexibility of existing stochastic block models in discovering general community structures and inherits the richness of previous models that also exploit popularity and productivity in modeling the real scale-free networks with power law degree distributions. To incorporate the contents in text-associated networks, we propose a combined model which combines the PPSB model with a discriminative model that models the community memberships of nodes by their contents. We then develop expectation-maximization (EM) algorithms to infer the parameters in the two models. Experiments on synthetic and real networks have demonstrated that the proposed models can yield better performances than previous models, especially on networks with general structures.
Hillslope characterization: Identifying key controls on local-scale plant communities' distribution using remote sensing and subsurface data fusion.

NASA Astrophysics Data System (ADS)

Falco, N.; Wainwright, H. M.; Dafflon, B.; Leger, E.; Peterson, J.; Steltzer, H.; Wilmer, C.; Williams, K. H.; Hubbard, S. S.

2017-12-01

Mountainous watershed systems are characterized by extreme heterogeneity in hydrological and pedological properties that influence biotic activities, plant communities and their dynamics. To gain predictive understanding of how ecosystem and watershed system evolve under climate change, it is critical to capture such heterogeneity and to quantify the effect of key environmental variables such as topography, and soil properties. In this study, we exploit advanced geophysical and remote sensing techniques - coupled with machine learning - to better characterize and quantify the interactions between plant communities' distribution and subsurface properties. First, we have developed a remote sensing data fusion framework based on the random forest (RF) classification algorithm to estimate the spatial distribution of plant communities. The framework allows the integration of both plant spectral and structural information, which are derived from multispectral satellite images and airborne LiDAR data. We then use the RF method to evaluate the estimated plant community map, exploiting the subsurface properties (such as bedrock depth, soil moisture and other properties) and geomorphological parameters (such as slope, curvature) as predictors. Datasets include high-resolution geophysical data (electrical resistivity tomography) and LiDAR digital elevation maps. We demonstrate our approach on a mountain hillslope and meadow within the East River watershed in Colorado, which is considered to be a representative headwater catchment in the Upper Colorado Basin. The obtained results show the existence of co-evolution between above and below-ground processes; in particular, dominant shrub communities in wet and flat areas. We show that successful integration of remote sensing data with geophysical measurements allows identifying and quantifying the key environmental controls on plant communities' distribution, and provides insights into their potential changes in the future climate conditions.
Evolutionary method for finding communities in bipartite networks.

PubMed

Zhan, Weihua; Zhang, Zhongzhi; Guan, Jihong; Zhou, Shuigeng

2011-06-01

An important step in unveiling the relation between network structure and dynamics defined on networks is to detect communities, and numerous methods have been developed separately to identify community structure in different classes of networks, such as unipartite networks, bipartite networks, and directed networks. Here, we show that the finding of communities in such networks can be unified in a general framework-detection of community structure in bipartite networks. Moreover, we propose an evolutionary method for efficiently identifying communities in bipartite networks. To this end, we show that both unipartite and directed networks can be represented as bipartite networks, and their modularity is completely consistent with that for bipartite networks, the detection of modular structure on which can be reformulated as modularity maximization. To optimize the bipartite modularity, we develop a modified adaptive genetic algorithm (MAGA), which is shown to be especially efficient for community structure detection. The high efficiency of the MAGA is based on the following three improvements we make. First, we introduce a different measure for the informativeness of a locus instead of the standard deviation, which can exactly determine which loci mutate. This measure is the bias between the distribution of a locus over the current population and the uniform distribution of the locus, i.e., the Kullback-Leibler divergence between them. Second, we develop a reassignment technique for differentiating the informative state a locus has attained from the random state in the initial phase. Third, we present a modified mutation rule which by incorporating related operations can guarantee the convergence of the MAGA to the global optimum and can speed up the convergence process. Experimental results show that the MAGA outperforms existing methods in terms of modularity for both bipartite and unipartite networks.
Metasecretome-selective phage display approach for mining the functional potential of a rumen microbial community.

PubMed

Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana

2014-05-12

In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.
Peak-locking centroid bias in Shack-Hartmann wavefront sensing

NASA Astrophysics Data System (ADS)

Anugu, Narsireddy; Garcia, Paulo J. V.; Correia, Carlos M.

2018-05-01

Shack-Hartmann wavefront sensing relies on accurate spot centre measurement. Several algorithms were developed with this aim, mostly focused on precision, i.e. minimizing random errors. In the solar and extended scene community, the importance of the accuracy (bias error due to peak-locking, quantization, or sampling) of the centroid determination was identified and solutions proposed. But these solutions only allow partial bias corrections. To date, no systematic study of the bias error was conducted. This article bridges the gap by quantifying the bias error for different correlation peak-finding algorithms and types of sub-aperture images and by proposing a practical solution to minimize its effects. Four classes of sub-aperture images (point source, elongated laser guide star, crowded field, and solar extended scene) together with five types of peak-finding algorithms (1D parabola, the centre of gravity, Gaussian, 2D quadratic polynomial, and pyramid) are considered, in a variety of signal-to-noise conditions. The best performing peak-finding algorithm depends on the sub-aperture image type, but none is satisfactory to both bias and random errors. A practical solution is proposed that relies on the antisymmetric response of the bias to the sub-pixel position of the true centre. The solution decreases the bias by a factor of ˜7 to values of ≲ 0.02 pix. The computational cost is typically twice of current cross-correlation algorithms.
Efficient discovery of overlapping communities in massive networks

PubMed Central

Gopalan, Prem K.; Blei, David M.

2013-01-01

Detecting overlapping communities is essential to analyzing and exploring natural networks such as social networks, biological networks, and citation networks. However, most existing approaches do not scale to the size of networks that we regularly observe in the real world. In this paper, we develop a scalable approach to community detection that discovers overlapping communities in massive real-world networks. Our approach is based on a Bayesian model of networks that allows nodes to participate in multiple communities, and a corresponding algorithm that naturally interleaves subsampling from the network and updating an estimate of its communities. We demonstrate how we can discover the hidden community structure of several real-world networks, including 3.7 million US patents, 575,000 physics articles from the arXiv preprint server, and 875,000 connected Web pages from the Internet. Furthermore, we demonstrate on large simulated networks that our algorithm accurately discovers the true community structure. This paper opens the door to using sophisticated statistical models to analyze massive networks. PMID:23950224
Rating Movies and Rating the Raters Who Rate Them

PubMed Central

Zhou, Hua; Lange, Kenneth

2010-01-01

The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data. PMID:20802818
Rating Movies and Rating the Raters Who Rate Them.

PubMed

Zhou, Hua; Lange, Kenneth

2009-11-01

The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.
The Joint Polar Satellite System (JPSS) Program's Algorithm Change Process (ACP): Past, Present and Future

NASA Technical Reports Server (NTRS)

Griffin, Ashley

2017-01-01

The Joint Polar Satellite System (JPSS) Program Office is the supporting organization for the Suomi National Polar Orbiting Partnership (S-NPP) and JPSS-1 satellites. S-NPP carries the following sensors: VIIRS, CrIS, ATMS, OMPS, and CERES with instruments that ultimately produce over 25 data products that cover the Earths weather, oceans, and atmosphere. A team of scientists and engineers from all over the United States document, monitor and fix errors in operational software code or documentation with the algorithm change process (ACP) to ensure the success of the S-NPP and JPSS 1 missions by maintaining quality and accuracy of the data products the scientific community relies on. This poster will outline the programs algorithm change process (ACP), identify the various users and scientific applications of our operational data products and highlight changes that have been made to the ACP to accommodate operating system upgrades to the JPSS programs Interface Data Processing Segment (IDPS), so that the program is ready for the transition to the 2017 JPSS-1 satellite mission and beyond.
Misdiagnosis of HIV infection during a South African community-based survey: implications for rapid HIV testing

PubMed Central

Kufa, Tendesayi; Kharsany, Ayesha BM; Cawood, Cherie; Khanyile, David; Lewis, Lara; Grobler, Anneke; Chipeta, Zawadi; Bere, Alfred; Glenshaw, Mary; Puren, Adrian

2017-01-01

Abstract Introduction: We describe the overall accuracy and performance of a serial rapid HIV testing algorithm used in community-based HIV testing in the context of a population-based household survey conducted in two sub-districts of uMgungundlovu district, KwaZulu-Natal, South Africa, against reference fourth-generation HIV-1/2 antibody and p24 antigen combination immunoassays. We discuss implications of the findings on rapid HIV testing programmes. Methods: Cross-sectional design: Following enrolment into the survey, questionnaires were administered to eligible and consenting participants in order to obtain demographic and HIV-related data. Peripheral blood samples were collected for HIV-related testing. Participants were offered community-based HIV testing in the home by trained field workers using a serial algorithm with two rapid diagnostic tests (RDTs) in series. In the laboratory, reference HIV testing was conducted using two fourth-generation immunoassays with all positives in the confirmatory test considered true positives. Accuracy, sensitivity, specificity, positive predictive value, negative predictive value and false-positive and false-negative rates were determined. Results: Of 10,236 individuals enrolled in the survey, 3740 were tested in the home (median age 24 years (interquartile range 19–31 years), 42.1% males and HIV positivity on RDT algorithm 8.0%). From those tested, 3729 (99.7%) had a definitive RDT result as well as a laboratory immunoassay result. The overall accuracy of the RDT when compared to the fourth-generation immunoassays was 98.8% (95% confidence interval (CI) 98.5–99.2). The sensitivity, specificity, positive predictive value and negative predictive value were 91.1% (95% CI 87.5–93.7), 99.9% (95% CI 99.8–100), 99.3% (95% CI 97.4–99.8) and 99.1% (95% CI 98.8–99.4) respectively. The false-positive and false-negative rates were 0.06% (95% CI 0.01–0.24) and 8.9% (95% CI 6.3–12.53). Compared to true positives, false negatives were more likely to be recently infected on limited antigen avidity assay and to report antiretroviral therapy (ART) use. Conclusions: The overall accuracy of the RDT algorithm was high. However, there were few false positives, and the sensitivity was lower than expected with high false negatives, despite implementation of quality assurance measures. False negatives were associated with recent (early) infection and ART exposure. The RDT algorithm was able to correctly identify the majority of HIV infections in community-based HIV testing. Messaging on the potential for false positives and false negatives should be included in these programmes. PMID:28872274
Link prediction based on local community properties

NASA Astrophysics Data System (ADS)

Yang, Xu-Hua; Zhang, Hai-Feng; Ling, Fei; Cheng, Zhi; Weng, Guo-Qing; Huang, Yu-Jiao

2016-09-01

The link prediction algorithm is one of the key technologies to reveal the inherent rule of network evolution. This paper proposes a novel link prediction algorithm based on the properties of the local community, which is composed of the common neighbor nodes of any two nodes in the network and the links between these nodes. By referring to the node degree and the condition of assortativity or disassortativity in a network, we comprehensively consider the effect of the shortest path and edge clustering coefficient within the local community on node similarity. We numerically show the proposed method provide good link prediction results.
The integrated care pathway for post stroke patients (iCaPPS): a shared care approach between stakeholders in areas with limited access to specialist stroke care services.

PubMed

Abdul Aziz, Aznida Firzah; Mohd Nordin, Nor Azlin; Ali, Mohd Fairuz; Abd Aziz, Noor Azah; Sulong, Saperi; Aljunid, Syed Mohamed

2017-01-13

Lack of intersectoral collaboration within public health sectors compound efforts to promote effective multidisciplinary post stroke care after discharge following acute phase. A coordinated, primary care-led care pathway to manage post stroke patients residing at home in the community was designed by an expert panel of specialist stroke care providers to help overcome fragmented post stroke care in areas where access is limited or lacking. Expert panel discussions comprising Family Medicine Specialists, Neurologists, Rehabilitation Physicians and Therapists, and Nurse Managers from Ministry of Health and acadaemia were conducted. In Phase One, experts chartered current care processes in public healthcare facilities, from acute stroke till discharge and also patients who presented late with stroke symptoms to public primary care health centres. In Phase Two, modified Delphi technique was employed to obtain consensus on recommendations, based on current evidence and best care practices. Care algorithms were designed around existing work schedules at public health centres. Indication for patients eligible for monitoring by primary care at public health centres were identified. Gaps in transfer of care occurred either at post discharge from acute care or primary care patients diagnosed at or beyond subacute phase at health centres. Essential information required during transfer of care from tertiary care to primary care providers was identified. Care algorithms including appropriate tools were summarised to guide primary care teams to identify patients requiring further multidisciplinary interventions. Shared care approaches with Specialist Stroke care team were outlined. Components of the iCaPPS were developed simultaneously: (i) iCaPPS-Rehab© for rehabilitation of stroke patients at community level (ii) iCaPPS-Swallow© guided the primary care team to screen and manage stroke related swallowing problems. Coordinated post stroke care monitoring service for patients at community level is achievable using the iCaPPS and its components as a guide. The iCaPPS may be used for post stroke care monitoring of patients in similar fragmented healthcare delivery systems or areas with limited access to specialist stroke care services. No.: ACTRN12616001322426 (Registration Date: 21st September 2016).
Accuracy assessment of vegetation community maps generated by aerial photography interpretation: perspective from the tropical savanna, Australia

NASA Astrophysics Data System (ADS)

Lewis, Donna L.; Phinn, Stuart

2011-01-01

Aerial photography interpretation is the most common mapping technique in the world. However, unlike an algorithm-based classification of satellite imagery, accuracy of aerial photography interpretation generated maps is rarely assessed. Vegetation communities covering an area of 530 km2 on Bullo River Station, Northern Territory, Australia, were mapped using an interpretation of 1:50,000 color aerial photography. Manual stereoscopic line-work was delineated at 1:10,000 and thematic maps generated at 1:25,000 and 1:100,000. Multivariate and intuitive analysis techniques were employed to identify 22 vegetation communities within the study area. The accuracy assessment was based on 50% of a field dataset collected over a 4 year period (2006 to 2009) and the remaining 50% of sites were used for map attribution. The overall accuracy and Kappa coefficient for both thematic maps was 66.67% and 0.63, respectively, calculated from standard error matrices. Our findings highlight the need for appropriate scales of mapping and accuracy assessment of aerial photography interpretation generated vegetation community maps.

Data fusion for a vision-aided radiological detection system: Calibration algorithm performance

NASA Astrophysics Data System (ADS)

Stadnikia, Kelsey; Henderson, Kristofer; Martin, Allan; Riley, Phillip; Koppal, Sanjeev; Enqvist, Andreas

2018-05-01

In order to improve the ability to detect, locate, track and identify nuclear/radiological threats, the University of Florida nuclear detection community has teamed up with the 3D vision community to collaborate on a low cost data fusion system. The key is to develop an algorithm to fuse the data from multiple radiological and 3D vision sensors as one system. The system under development at the University of Florida is being assessed with various types of radiological detectors and widely available visual sensors. A series of experiments were devised utilizing two EJ-309 liquid organic scintillation detectors (one primary and one secondary), a Microsoft Kinect for Windows v2 sensor and a Velodyne HDL-32E High Definition LiDAR Sensor which is a highly sensitive vision sensor primarily used to generate data for self-driving cars. Each experiment consisted of 27 static measurements of a source arranged in a cube with three different distances in each dimension. The source used was Cf-252. The calibration algorithm developed is utilized to calibrate the relative 3D-location of the two different types of sensors without need to measure it by hand; thus, preventing operator manipulation and human errors. The algorithm can also account for the facility dependent deviation from ideal data fusion correlation. Use of the vision sensor to determine the location of a sensor would also limit the possible locations and it does not allow for room dependence (facility dependent deviation) to generate a detector pseudo-location to be used for data analysis later. Using manually measured source location data, our algorithm-predicted the offset detector location within an average of 20 cm calibration-difference to its actual location. Calibration-difference is the Euclidean distance from the algorithm predicted detector location to the measured detector location. The Kinect vision sensor data produced an average calibration-difference of 35 cm and the HDL-32E produced an average calibration-difference of 22 cm. Using NaI and He-3 detectors in place of the EJ-309, the calibration-difference was 52 cm for NaI and 75 cm for He-3. The algorithm is not detector dependent; however, from these results it was determined that detector dependent adjustments are required.
Using a Novel Evolutionary Algorithm to More Effectively Apply Community-Driven EcoHealth Interventions in Big Data with Application to Chagas Disease

NASA Astrophysics Data System (ADS)

Rizzo, D. M.; Hanley, J.; Monroy, C.; Rodas, A.; Stevens, L.; Dorn, P.

2016-12-01

Chagas disease is a deadly, neglected tropical disease that is endemic to every country in Central and South America. The principal insect vector of Chagas disease in Central America is Triatoma dimidiata. EcoHealth interventions are an environmentally friendly alternative that use local materials to lower household infestation, reduce the risk of infestation, and improve the quality of life. Our collaborators from La Universidad de San Carlos de Guatemala along with Ministry of Health Officials reach out to communities with high infestation and teach the community EcoHealth interventions. The process of identifying which interventions have the potential to be most effective as well as the houses that are most at risk is both expensive and time consuming. In order to better identify the risk factors associated with household infestation of T. dimidiata, a number of studies have conducted socioeconomic and entomologic surveys that contain numerous potential risk factors consisting of both nominal and ordinal data. Univariate logistic regression is one of the more popular methods for determining which risk factors are most closely associated with infestation. However, this tool has limitations, especially with the large amount and type of "Big Data" associated with our study sites (e.g., 5 villages comprise of socioeconomic, demographic, and entomologic data). The infestation of a household with T. dimidiata is a complex problem that is most likely not univariate in nature and is likely to contain higher order epistatic relationships that cannot be discovered using univariate logistic regression. Add to this, the problems raised with using p-values in traditional statistics. Also, our T. dimidiata infestation dataset is too large to exhaustively search. Therefore, we use a novel evolutionary algorithm to efficiently search for higher order interactions in surveys associated with households infested with T. dimidiata. In this study, we use our novel evolutionary algorithm to efficiently search for higher order interactions in a T. dimidiata infestation dataset that contains 1,132 houses, 61 risk factors (both nominal and ordinal), and 16% of the data is missing. Our goal is determine the risk factors that are most commonly associated with infestation to more efficiently apply EcoHealth interventions.
Modeling the heterogeneous traffic correlations in urban road systems using traffic-enhanced community detection approach

NASA Astrophysics Data System (ADS)

Lu, Feng; Liu, Kang; Duan, Yingying; Cheng, Shifen; Du, Fei

2018-07-01

A better characterization of the traffic influence among urban roads is crucial for traffic control and traffic forecasting. The existence of spatial heterogeneity imposes great influence on modeling the extent and degree of road traffic correlation, which is usually neglected by the traditional distance based method. In this paper, we propose a traffic-enhanced community detection approach to spatially reveal the traffic correlation in city road networks. First, the road network is modeled as a traffic-enhanced dual graph with the closeness between two road segments determined not only by their topological connection, but also by the traffic correlation between them. Then a flow-based community detection algorithm called Infomap is utilized to identify the road segment clusters. Evaluated by Moran's I, Calinski-Harabaz Index and the traffic interpolation application, we find that compared to the distance based method and the community based method, our proposed traffic-enhanced community based method behaves better in capturing the extent of traffic relevance as both the topological structure of the road network and the traffic correlations among urban roads are considered. It can be used in more traffic-related applications, such as traffic forecasting, traffic control and guidance.
Validation of an International Classification of Diseases, Ninth Revision Code Algorithm for Identifying Chiari Malformation Type 1 Surgery in Adults.

PubMed

Greenberg, Jacob K; Ladner, Travis R; Olsen, Margaret A; Shannon, Chevis N; Liu, Jingxia; Yarbrough, Chester K; Piccirillo, Jay F; Wellons, John C; Smyth, Matthew D; Park, Tae Sung; Limbrick, David D

2015-08-01

The use of administrative billing data may enable large-scale assessments of treatment outcomes for Chiari Malformation type I (CM-1). However, to utilize such data sets, validated International Classification of Diseases, Ninth Revision (ICD-9-CM) code algorithms for identifying CM-1 surgery are needed. To validate 2 ICD-9-CM code algorithms identifying patients undergoing CM-1 decompression surgery. We retrospectively analyzed the validity of 2 ICD-9-CM code algorithms for identifying adult CM-1 decompression surgery performed at 2 academic medical centers between 2001 and 2013. Algorithm 1 included any discharge diagnosis code of 348.4 (CM-1), as well as a procedure code of 01.24 (cranial decompression) or 03.09 (spinal decompression, or laminectomy). Algorithm 2 restricted this group to patients with a primary diagnosis of 348.4. The positive predictive value (PPV) and sensitivity of each algorithm were calculated. Among 340 first-time admissions identified by Algorithm 1, the overall PPV for CM-1 decompression was 65%. Among the 214 admissions identified by Algorithm 2, the overall PPV was 99.5%. The PPV for Algorithm 1 was lower in the Vanderbilt (59%) cohort, males (40%), and patients treated between 2009 and 2013 (57%), whereas the PPV of Algorithm 2 remained high (≥99%) across subgroups. The sensitivity of Algorithms 1 (86%) and 2 (83%) were above 75% in all subgroups. ICD-9-CM code Algorithm 2 has excellent PPV and good sensitivity to identify adult CM-1 decompression surgery. These results lay the foundation for studying CM-1 treatment outcomes by using large administrative databases.
Variable selection and model choice in geoadditive regression models.

PubMed

Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

2009-06-01

Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rivard, M.

With the recent introduction of heterogeneity correction algorithms for brachytherapy, the AAPM community is still unclear on how to commission and implement these into clinical practice. The recently-published AAPM TG-186 report discusses important issues for clinical implementation of these algorithms. A charge of the AAPM-ESTRO-ABG Working Group on MBDCA in Brachytherapy (WGMBDCA) is the development of a set of well-defined test case plans, available as references in the software commissioning process to be performed by clinical end-users. In this practical medical physics course, specific examples on how to perform the commissioning process are presented, as well as descriptions of themore » clinical impact from recent literature reporting comparisons of TG-43 and heterogeneity-based dosimetry. Learning Objectives: Identify key clinical applications needing advanced dose calculation in brachytherapy. Review TG-186 and WGMBDCA guidelines, commission process, and dosimetry benchmarks. Evaluate clinical cases using commercially available systems and compare to TG-43 dosimetry.« less
A community effort to assess and improve drug sensitivity prediction algorithms

PubMed Central

Costello, James C; Heiser, Laura M; Georgii, Elisabeth; Gönen, Mehmet; Menden, Michael P; Wang, Nicholas J; Bansal, Mukesh; Ammad-ud-din, Muhammad; Hintsanen, Petteri; Khan, Suleiman A; Mpindi, John-Patrick; Kallioniemi, Olli; Honkela, Antti; Aittokallio, Tero; Wennerberg, Krister; Collins, James J; Gallahan, Dan; Singer, Dinah; Saez-Rodriguez, Julio; Kaski, Samuel; Gray, Joe W; Stolovitzky, Gustavo

2015-01-01

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods. PMID:24880487
A community effort to assess and improve drug sensitivity prediction algorithms.

PubMed

Costello, James C; Heiser, Laura M; Georgii, Elisabeth; Gönen, Mehmet; Menden, Michael P; Wang, Nicholas J; Bansal, Mukesh; Ammad-ud-din, Muhammad; Hintsanen, Petteri; Khan, Suleiman A; Mpindi, John-Patrick; Kallioniemi, Olli; Honkela, Antti; Aittokallio, Tero; Wennerberg, Krister; Collins, James J; Gallahan, Dan; Singer, Dinah; Saez-Rodriguez, Julio; Kaski, Samuel; Gray, Joe W; Stolovitzky, Gustavo

2014-12-01

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.
Improving the recommender algorithms with the detected communities in bipartite networks

NASA Astrophysics Data System (ADS)

Zhang, Peng; Wang, Duo; Xiao, Jinghua

2017-04-01

Recommender system offers a powerful tool to make information overload problem well solved and thus gains wide concerns of scholars and engineers. A key challenge is how to make recommendations more accurate and personalized. We notice that community structures widely exist in many real networks, which could significantly affect the recommendation results. By incorporating the information of detected communities in the recommendation algorithms, an improved recommendation approach for the networks with communities is proposed. The approach is examined in both artificial and real networks, the results show that the improvement on accuracy and diversity can be 20% and 7%, respectively. This reveals that it is beneficial to classify the nodes based on the inherent properties in recommender systems.
Fast Fragmentation of Networks Using Module-Based Attacks

PubMed Central

Requião da Cunha, Bruno; González-Avella, Juan Carlos; Gonçalves, Sebastián

2015-01-01

In the multidisciplinary field of Network Science, optimization of procedures for efficiently breaking complex networks is attracting much attention from a practical point of view. In this contribution, we present a module-based method to efficiently fragment complex networks. The procedure firstly identifies topological communities through which the network can be represented using a well established heuristic algorithm of community finding. Then only the nodes that participate of inter-community links are removed in descending order of their betweenness centrality. We illustrate the method by applying it to a variety of examples in the social, infrastructure, and biological fields. It is shown that the module-based approach always outperforms targeted attacks to vertices based on node degree or betweenness centrality rankings, with gains in efficiency strongly related to the modularity of the network. Remarkably, in the US power grid case, by deleting 3% of the nodes, the proposed method breaks the original network in fragments which are twenty times smaller in size than the fragments left by betweenness-based attack. PMID:26569610
How artificial intelligence tools can be used to assess individual patient risk in cardiovascular disease: problems with the current methods.

PubMed

Grossi, Enzo

2006-05-03

In recent years a number of algorithms for cardiovascular risk assessment has been proposed to the medical community. These algorithms consider a number of variables and express their results as the percentage risk of developing a major fatal or non-fatal cardiovascular event in the following 10 to 20 years The author has identified three major pitfalls of these algorithms, linked to the limitation of the classical statistical approach in dealing with this kind of non linear and complex information. The pitfalls are the inability to capture the disease complexity, the inability to capture process dynamics, and the wide confidence interval of individual risk assessment. Artificial Intelligence tools can provide potential advantage in trying to overcome these limitations. The theoretical background and some application examples related to artificial neural networks and fuzzy logic have been reviewed and discussed. The use of predictive algorithms to assess individual absolute risk of cardiovascular future events is currently hampered by methodological and mathematical flaws. The use of newer approaches, such as fuzzy logic and artificial neural networks, linked to artificial intelligence, seems to better address both the challenge of increasing complexity resulting from a correlation between predisposing factors, data on the occurrence of cardiovascular events, and the prediction of future events on an individual level.
Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm

NASA Astrophysics Data System (ADS)

Salameh Shreem, Salam; Abdullah, Salwani; Nazri, Mohd Zakree Ahmad

2016-04-01

Microarray technology can be used as an efficient diagnostic system to recognise diseases such as tumours or to discriminate between different types of cancers in normal tissues. This technology has received increasing attention from the bioinformatics community because of its potential in designing powerful decision-making tools for cancer diagnosis. However, the presence of thousands or tens of thousands of genes affects the predictive accuracy of this technology from the perspective of classification. Thus, a key issue in microarray data is identifying or selecting the smallest possible set of genes from the input data that can achieve good predictive accuracy for classification. In this work, we propose a two-stage selection algorithm for gene selection problems in microarray data-sets called the symmetrical uncertainty filter and harmony search algorithm wrapper (SU-HSA). Experimental results show that the SU-HSA is better than HSA in isolation for all data-sets in terms of the accuracy and achieves a lower number of genes on 6 out of 10 instances. Furthermore, the comparison with state-of-the-art methods shows that our proposed approach is able to obtain 5 (out of 10) new best results in terms of the number of selected genes and competitive results in terms of the classification accuracy.
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

PubMed Central

Goldstein, Markus; Uchida, Seiichi

2016-01-01

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks. PMID:27093601
An effective trust-based recommendation method using a novel graph clustering algorithm

NASA Astrophysics Data System (ADS)

Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

2015-10-01

Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
Survey of Non-Rigid Registration Tools in Medicine.

PubMed

Keszei, András P; Berkels, Benjamin; Deserno, Thomas M

2017-02-01

We catalogue available software solutions for non-rigid image registration to support scientists in selecting suitable tools for specific medical registration purposes. Registration tools were identified using non-systematic search in Pubmed, Web of Science, IEEE Xplore® Digital Library, Google Scholar, and through references in identified sources (n = 22). Exclusions are due to unavailability or inappropriateness. The remaining (n = 18) tools were classified by (i) access and technology, (ii) interfaces and application, (iii) living community, (iv) supported file formats, and (v) types of registration methodologies emphasizing the similarity measures implemented. Out of the 18 tools, (i) 12 are open source, 8 are released under a permissive free license, which imposes the least restrictions on the use and further development of the tool, 8 provide graphical processing unit (GPU) support; (ii) 7 are built on software platforms, 5 were developed for brain image registration; (iii) 6 are under active development but only 3 have had their last update in 2015 or 2016; (iv) 16 support the Analyze format, while 7 file formats can be read with only one of the tools; and (v) 6 provide multiple registration methods and 6 provide landmark-based registration methods. Based on open source, licensing, GPU support, active community, several file formats, algorithms, and similarity measures, the tools Elastics and Plastimatch are chosen for the platform ITK and without platform requirements, respectively. Researchers in medical image analysis already have a large choice of registration tools freely available. However, the most recently published algorithms may not be included in the tools, yet.
Careful Selection of Reference Genes Is Required for Reliable Performance of RT-qPCR in Human Normal and Cancer Cell Lines

PubMed Central

Jacob, Francis; Guertler, Rea; Naim, Stephanie; Nixdorf, Sheri; Fedier, André; Hacker, Neville F.; Heinzelmann-Schwarz, Viola

2013-01-01

Reverse Transcription - quantitative Polymerase Chain Reaction (RT-qPCR) is a standard technique in most laboratories. The selection of reference genes is essential for data normalization and the selection of suitable reference genes remains critical. Our aim was to 1) review the literature since implementation of the MIQE guidelines in order to identify the degree of acceptance; 2) compare various algorithms in their expression stability; 3) identify a set of suitable and most reliable reference genes for a variety of human cancer cell lines. A PubMed database review was performed and publications since 2009 were selected. Twelve putative reference genes were profiled in normal and various cancer cell lines (n = 25) using 2-step RT-qPCR. Investigated reference genes were ranked according to their expression stability by five algorithms (geNorm, Normfinder, BestKeeper, comparative ΔCt, and RefFinder). Our review revealed 37 publications, with two thirds patient samples and one third cell lines. qPCR efficiency was given in 68.4% of all publications, but only 28.9% of all studies provided RNA/cDNA amount and standard curves. GeNorm and Normfinder algorithms were used in 60.5% in combination. In our selection of 25 cancer cell lines, we identified HSPCB, RRN18S, and RPS13 as the most stable expressed reference genes. In the subset of ovarian cancer cell lines, the reference genes were PPIA, RPS13 and SDHA, clearly demonstrating the necessity to select genes depending on the research focus. Moreover, a cohort of at least three suitable reference genes needs to be established in advance to the experiments, according to the guidelines. For establishing a set of reference genes for gene normalization we recommend the use of ideally three reference genes selected by at least three stability algorithms. The unfortunate lack of compliance to the MIQE guidelines reflects that these need to be further established in the research community. PMID:23554992
A Novel Patient Recruitment Strategy: Patient Selection Directly from the Community through Linkage to Clinical Data.

PubMed

Zimmerman, Lindsay P; Goel, Satyender; Sathar, Shazia; Gladfelter, Charon E; Onate, Alejandra; Kane, Lindsey L; Sital, Shelly; Phua, Jasmin; Davis, Paris; Margellos-Anast, Helen; Meltzer, David O; Polonsky, Tamar S; Shah, Raj C; Trick, William E; Ahmad, Faraz S; Kho, Abel N

2018-01-01

This article presents and describes our methods in developing a novel strategy for recruitment of underrepresented, community-based participants, for pragmatic research studies leveraging routinely collected electronic health record (EHR) data. We designed a new approach for recruiting eligible patients from the community, while also leveraging affiliated health systems to extract clinical data for community participants. The strategy involves methods for data collection, linkage, and tracking. In this workflow, potential participants are identified in the community and surveyed regarding eligibility. These data are then encrypted and deidentified via a hashing algorithm for linkage of the community participant back to a record at a clinical site. The linkage allows for eligibility verification and automated follow-up. Longitudinal data are collected by querying the EHR data and surveying the community participant directly. We discuss this strategy within the context of two national research projects, a clinical trial and an observational cohort study. The community-based recruitment strategy is a novel, low-touch, clinical trial enrollment method to engage a diverse set of participants. Direct outreach to community participants, while utilizing EHR data for clinical information and follow-up, allows for efficient recruitment and follow-up strategies. This new strategy for recruitment links data reported from community participants to clinical data in the EHR and allows for eligibility verification and automated follow-up. The workflow has the potential to improve recruitment efficiency and engage traditionally underrepresented individuals in research. Schattauer GmbH Stuttgart.
The Soil Moisture Active Passive Mission (SMAP) Science Data Products: Results of Testing with Field Experiment and Algorithm Testbed Simulation Environment Data

NASA Technical Reports Server (NTRS)

Entekhabi, Dara; Njoku, Eni E.; O'Neill, Peggy E.; Kellogg, Kent H.; Entin, Jared K.

2010-01-01

Talk outline 1. Derivation of SMAP basic and applied science requirements from the NRC Earth Science Decadal Survey applications 2. Data products and latencies 3. Algorithm highlights 4. SMAP Algorithm Testbed 5. SMAP Working Groups and community engagement
Continued research on selected parameters to minimize community annoyance from airplane noise

NASA Technical Reports Server (NTRS)

Frair, L.

1981-01-01

Results from continued research on selected parameters to minimize community annoyance from airport noise are reported. First, a review of the initial work on this problem is presented. Then the research focus is expanded by considering multiobjective optimization approaches for this problem. A multiobjective optimization algorithm review from the open literature is presented. This is followed by the multiobjective mathematical formulation for the problem of interest. A discussion of the appropriate solution algorithm for the multiobjective formulation is conducted. Alternate formulations and associated solution algorithms are discussed and evaluated for this airport noise problem. Selected solution algorithms that have been implemented are then used to produce computational results for example airports. These computations involved finding the optimal operating scenario for a moderate size airport and a series of sensitivity analyses for a smaller example airport.
HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kannan, Ramakrishnan; Sukumar, Sreenivas R.; Ballard, Grey M.

NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems formore » $$\\WW$$ and $$\\HH$$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $$\\WW$$ and $$\\HH$$ within the alternating iterations.« less

Mapping the Evolution of Scientific Fields

PubMed Central

Herrera, Mark; Roberts, David C.; Gulbahce, Natali

2010-01-01

Despite the apparent cross-disciplinary interactions among scientific fields, a formal description of their evolution is lacking. Here we describe a novel approach to study the dynamics and evolution of scientific fields using a network-based analysis. We build an idea network consisting of American Physical Society Physics and Astronomy Classification Scheme (PACS) numbers as nodes representing scientific concepts. Two PACS numbers are linked if there exist publications that reference them simultaneously. We locate scientific fields using a community finding algorithm, and describe the time evolution of these fields over the course of 1985–2006. The communities we identify map to known scientific fields, and their age depends on their size and activity. We expect our approach to quantifying the evolution of ideas to be relevant for making predictions about the future of science and thus help to guide its development. PMID:20463949
Mapping the evolution of scientific fields.

PubMed

Herrera, Mark; Roberts, David C; Gulbahce, Natali

2010-05-04

Despite the apparent cross-disciplinary interactions among scientific fields, a formal description of their evolution is lacking. Here we describe a novel approach to study the dynamics and evolution of scientific fields using a network-based analysis. We build an idea network consisting of American Physical Society Physics and Astronomy Classification Scheme (PACS) numbers as nodes representing scientific concepts. Two PACS numbers are linked if there exist publications that reference them simultaneously. We locate scientific fields using a community finding algorithm, and describe the time evolution of these fields over the course of 1985-2006. The communities we identify map to known scientific fields, and their age depends on their size and activity. We expect our approach to quantifying the evolution of ideas to be relevant for making predictions about the future of science and thus help to guide its development.
Linear triangular optimization technique and pricing scheme in residential energy management systems

NASA Astrophysics Data System (ADS)

Anees, Amir; Hussain, Iqtadar; AlKhaldi, Ali Hussain; Aslam, Muhammad

2018-06-01

This paper presents a new linear optimization algorithm for power scheduling of electric appliances. The proposed system is applied in a smart home community, in which community controller acts as a virtual distribution company for the end consumers. We also present a pricing scheme between community controller and its residential users based on real-time pricing and likely block rates. The results of the proposed optimization algorithm demonstrate that by applying the anticipated technique, not only end users can minimise the consumption cost, but it can also reduce the power peak to an average ratio which will be beneficial for the utilities as well.
A New Biogeochemical Computational Framework Integrated within the Community Land Model

NASA Astrophysics Data System (ADS)

Fang, Y.; Li, H.; Liu, C.; Huang, M.; Leung, L.

2012-12-01

Terrestrial biogeochemical processes, particularly carbon cycle dynamics, have been shown to significantly influence regional and global climate changes. Modeling terrestrial biogeochemical processes within the land component of Earth System Models such as the Community Land model (CLM), however, faces three major challenges: 1) extensive efforts in modifying modeling structures and rewriting computer programs to incorporate biogeochemical processes with increasing complexity, 2) expensive computational cost to solve the governing equations due to numerical stiffness inherited from large variations in the rates of biogeochemical processes, and 3) lack of an efficient framework to systematically evaluate various mathematical representations of biogeochemical processes. To address these challenges, we introduce a new computational framework to incorporate biogeochemical processes into CLM, which consists of a new biogeochemical module with a generic algorithm and reaction database. New and updated biogeochemical processes can be incorporated into CLM without significant code modification. To address the stiffness issue, algorithms and criteria will be developed to identify fast processes, which will be replaced with algebraic equations and decoupled from slow processes. This framework can serve as a generic and user-friendly platform to test out different mechanistic process representations and datasets and gain new insight on the behavior of the terrestrial ecosystems in response to climate change in a systematic way.
Benchmarking Gas Path Diagnostic Methods: A Public Approach

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Bird, Jeff; Davison, Craig; Volponi, Al; Iverson, R. Eugene

2008-01-01

Recent technology reviews have identified the need for objective assessments of engine health management (EHM) technology. The need is two-fold: technology developers require relevant data and problems to design and validate new algorithms and techniques while engine system integrators and operators need practical tools to direct development and then evaluate the effectiveness of proposed solutions. This paper presents a publicly available gas path diagnostic benchmark problem that has been developed by the Propulsion and Power Systems Panel of The Technical Cooperation Program (TTCP) to help address these needs. The problem is coded in MATLAB (The MathWorks, Inc.) and coupled with a non-linear turbofan engine simulation to produce "snap-shot" measurements, with relevant noise levels, as if collected from a fleet of engines over their lifetime of use. Each engine within the fleet will experience unique operating and deterioration profiles, and may encounter randomly occurring relevant gas path faults including sensor, actuator and component faults. The challenge to the EHM community is to develop gas path diagnostic algorithms to reliably perform fault detection and isolation. An example solution to the benchmark problem is provided along with associated evaluation metrics. A plan is presented to disseminate this benchmark problem to the engine health management technical community and invite technology solutions.
Efficient Screening of Climate Model Sensitivity to a Large Number of Perturbed Input Parameters [plus supporting information

DOE PAGES

Covey, Curt; Lucas, Donald D.; Tannahill, John; ...

2013-07-01

Modern climate models contain numerous input parameters, each with a range of possible values. Since the volume of parameter space increases exponentially with the number of parameters N, it is generally impossible to directly evaluate a model throughout this space even if just 2-3 values are chosen for each parameter. Sensitivity screening algorithms, however, can identify input parameters having relatively little effect on a variety of output fields, either individually or in nonlinear combination.This can aid both model development and the uncertainty quantification (UQ) process. Here we report results from a parameter sensitivity screening algorithm hitherto untested in climate modeling,more » the Morris one-at-a-time (MOAT) method. This algorithm drastically reduces the computational cost of estimating sensitivities in a high dimensional parameter space because the sample size grows linearly rather than exponentially with N. It nevertheless samples over much of the N-dimensional volume and allows assessment of parameter interactions, unlike traditional elementary one-at-a-time (EOAT) parameter variation. We applied both EOAT and MOAT to the Community Atmosphere Model (CAM), assessing CAM’s behavior as a function of 27 uncertain input parameters related to the boundary layer, clouds, and other subgrid scale processes. For radiation balance at the top of the atmosphere, EOAT and MOAT rank most input parameters similarly, but MOAT identifies a sensitivity that EOAT underplays for two convection parameters that operate nonlinearly in the model. MOAT’s ranking of input parameters is robust to modest algorithmic variations, and it is qualitatively consistent with model development experience. Supporting information is also provided at the end of the full text of the article.« less
Bank-firm credit network in Japan: an analysis of a bipartite network.

PubMed

Marotta, Luca; Miccichè, Salvatore; Fujiwara, Yoshi; Iyetomi, Hiroshi; Aoyama, Hideaki; Gallegati, Mauro; Mantegna, Rosario N

2015-01-01

We investigate the networked nature of the Japanese credit market. Our investigation is performed with tools of network science. In our investigation we perform community detection with an algorithm which is identifying communities composed of both banks and firms. We show that the communities obtained by directly working on the bipartite network carry information about the networked nature of the Japanese credit market. Our analysis is performed for each calendar year during the time period from 1980 to 2011. To investigate the time evolution of the networked structure of the credit market we introduce a new statistical method to track the time evolution of detected communities. We then characterize the time evolution of communities by detecting for each time evolving set of communities the over-expression of attributes of firms and banks. Specifically, we consider as attributes the economic sector and the geographical location of firms and the type of banks. In our 32-year-long analysis we detect a persistence of the over-expression of attributes of communities of banks and firms together with a slow dynamic of changes from some specific attributes to new ones. Our empirical observations show that the credit market in Japan is a networked market where the type of banks, geographical location of firms and banks, and economic sector of the firm play a role in shaping the credit relationships between banks and firms.
Bank-Firm Credit Network in Japan: An Analysis of a Bipartite Network

PubMed Central

Marotta, Luca; Miccichè, Salvatore; Fujiwara, Yoshi; Iyetomi, Hiroshi; Aoyama, Hideaki; Gallegati, Mauro; Mantegna, Rosario N.

2015-01-01

We investigate the networked nature of the Japanese credit market. Our investigation is performed with tools of network science. In our investigation we perform community detection with an algorithm which is identifying communities composed of both banks and firms. We show that the communities obtained by directly working on the bipartite network carry information about the networked nature of the Japanese credit market. Our analysis is performed for each calendar year during the time period from 1980 to 2011. To investigate the time evolution of the networked structure of the credit market we introduce a new statistical method to track the time evolution of detected communities. We then characterize the time evolution of communities by detecting for each time evolving set of communities the over-expression of attributes of firms and banks. Specifically, we consider as attributes the economic sector and the geographical location of firms and the type of banks. In our 32-year-long analysis we detect a persistence of the over-expression of attributes of communities of banks and firms together with a slow dynamic of changes from some specific attributes to new ones. Our empirical observations show that the credit market in Japan is a networked market where the type of banks, geographical location of firms and banks, and economic sector of the firm play a role in shaping the credit relationships between banks and firms. PMID:25933413
Weighted community detection and data clustering using message passing

NASA Astrophysics Data System (ADS)

Shi, Cheng; Liu, Yanchen; Zhang, Pan

2018-03-01

Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.
Approximate Computing Techniques for Iterative Graph Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh

Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
The Malaria System MicroApp: A New, Mobile Device-Based Tool for Malaria Diagnosis.

PubMed

Oliveira, Allisson Dantas; Prats, Clara; Espasa, Mateu; Zarzuela Serrat, Francesc; Montañola Sales, Cristina; Silgado, Aroa; Codina, Daniel Lopez; Arruda, Mercia Eliane; I Prat, Jordi Gomez; Albuquerque, Jones

2017-04-25

Malaria is a public health problem that affects remote areas worldwide. Climate change has contributed to the problem by allowing for the survival of Anopheles in previously uninhabited areas. As such, several groups have made developing news systems for the automated diagnosis of malaria a priority. The objective of this study was to develop a new, automated, mobile device-based diagnostic system for malaria. The system uses Giemsa-stained peripheral blood samples combined with light microscopy to identify the Plasmodium falciparum species in the ring stage of development. The system uses image processing and artificial intelligence techniques as well as a known face detection algorithm to identify Plasmodium parasites. The algorithm is based on integral image and haar-like features concepts, and makes use of weak classifiers with adaptive boosting learning. The search scope of the learning algorithm is reduced in the preprocessing step by removing the background around blood cells. As a proof of concept experiment, the tool was used on 555 malaria-positive and 777 malaria-negative previously-made slides. The accuracy of the system was, on average, 91%, meaning that for every 100 parasite-infected samples, 91 were identified correctly. Accessibility barriers of low-resource countries can be addressed with low-cost diagnostic tools. Our system, developed for mobile devices (mobile phones and tablets), addresses this by enabling access to health centers in remote communities, and importantly, not depending on extensive malaria expertise or expensive diagnostic detection equipment. ©Allisson Dantas Oliveira, Clara Prats, Mateu Espasa, Francesc Zarzuela Serrat, Cristina Montañola Sales, Aroa Silgado, Daniel Lopez Codina, Mercia Eliane Arruda, Jordi Gomez i Prat, Jones Albuquerque. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 25.04.2017.
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

PubMed Central

Dröge, J.; Gregor, I.; McHardy, A. C.

2015-01-01

Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150
The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

NASA Astrophysics Data System (ADS)

Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan

2018-04-01

Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Comparative analysis on the selection of number of clusters in community detection

NASA Astrophysics Data System (ADS)

Kawamoto, Tatsuro; Kabashima, Yoshiyuki

2018-02-01

We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Automated Recognition of 3D Features in GPIR Images

NASA Technical Reports Server (NTRS)

Park, Han; Stough, Timothy; Fijany, Amir

2007-01-01

A method of automated recognition of three-dimensional (3D) features in images generated by ground-penetrating imaging radar (GPIR) is undergoing development. GPIR 3D images can be analyzed to detect and identify such subsurface features as pipes and other utility conduits. Until now, much of the analysis of GPIR images has been performed manually by expert operators who must visually identify and track each feature. The present method is intended to satisfy a need for more efficient and accurate analysis by means of algorithms that can automatically identify and track subsurface features, with minimal supervision by human operators. In this method, data from multiple sources (for example, data on different features extracted by different algorithms) are fused together for identifying subsurface objects. The algorithms of this method can be classified in several different ways. In one classification, the algorithms fall into three classes: (1) image-processing algorithms, (2) feature- extraction algorithms, and (3) a multiaxis data-fusion/pattern-recognition algorithm that includes a combination of machine-learning, pattern-recognition, and object-linking algorithms. The image-processing class includes preprocessing algorithms for reducing noise and enhancing target features for pattern recognition. The feature-extraction algorithms operate on preprocessed data to extract such specific features in images as two-dimensional (2D) slices of a pipe. Then the multiaxis data-fusion/ pattern-recognition algorithm identifies, classifies, and reconstructs 3D objects from the extracted features. In this process, multiple 2D features extracted by use of different algorithms and representing views along different directions are used to identify and reconstruct 3D objects. In object linking, which is an essential part of this process, features identified in successive 2D slices and located within a threshold radius of identical features in adjacent slices are linked in a directed-graph data structure. Relative to past approaches, this multiaxis approach offers the advantages of more reliable detections, better discrimination of objects, and provision of redundant information, which can be helpful in filling gaps in feature recognition by one of the component algorithms. The image-processing class also includes postprocessing algorithms that enhance identified features to prepare them for further scrutiny by human analysts (see figure). Enhancement of images as a postprocessing step is a significant departure from traditional practice, in which enhancement of images is a preprocessing step.
Community detection, link prediction, and layer interdependence in multilayer networks.

PubMed

De Bacco, Caterina; Power, Eleanor A; Larremore, Daniel B; Moore, Cristopher

2017-04-01

Complex systems are often characterized by distinct types of interactions between the same entities. These can be described as a multilayer network where each layer represents one type of interaction. These layers may be interdependent in complicated ways, revealing different kinds of structure in the network. In this work we present a generative model, and an efficient expectation-maximization algorithm, which allows us to perform inference tasks such as community detection and link prediction in this setting. Our model assumes overlapping communities that are common between the layers, while allowing these communities to affect each layer in a different way, including arbitrary mixtures of assortative, disassortative, or directed structure. It also gives us a mathematically principled way to define the interdependence between layers, by measuring how much information about one layer helps us predict links in another layer. In particular, this allows us to bundle layers together to compress redundant information and identify small groups of layers which suffice to predict the remaining layers accurately. We illustrate these findings by analyzing synthetic data and two real multilayer networks, one representing social support relationships among villagers in South India and the other representing shared genetic substring material between genes of the malaria parasite.
Community detection, link prediction, and layer interdependence in multilayer networks

NASA Astrophysics Data System (ADS)

De Bacco, Caterina; Power, Eleanor A.; Larremore, Daniel B.; Moore, Cristopher

2017-04-01

Complex systems are often characterized by distinct types of interactions between the same entities. These can be described as a multilayer network where each layer represents one type of interaction. These layers may be interdependent in complicated ways, revealing different kinds of structure in the network. In this work we present a generative model, and an efficient expectation-maximization algorithm, which allows us to perform inference tasks such as community detection and link prediction in this setting. Our model assumes overlapping communities that are common between the layers, while allowing these communities to affect each layer in a different way, including arbitrary mixtures of assortative, disassortative, or directed structure. It also gives us a mathematically principled way to define the interdependence between layers, by measuring how much information about one layer helps us predict links in another layer. In particular, this allows us to bundle layers together to compress redundant information and identify small groups of layers which suffice to predict the remaining layers accurately. We illustrate these findings by analyzing synthetic data and two real multilayer networks, one representing social support relationships among villagers in South India and the other representing shared genetic substring material between genes of the malaria parasite.
Weighted compactness function based label propagation algorithm for community detection

NASA Astrophysics Data System (ADS)

Zhang, Weitong; Zhang, Rui; Shang, Ronghua; Jiao, Licheng

2018-02-01

Community detection in complex networks, is to detect the community structure with the internal structure relatively compact and the external structure relatively sparse, according to the topological relationship among nodes in the network. In this paper, we propose a compactness function which combines the weight of nodes, and use it as the objective function to carry out the node label propagation. Firstly, according to the node degree, we find the sets of core nodes which have great influence on the network. The more the connections between the core nodes and the other nodes are, the larger the amount of the information these kernel nodes receive and transform. Then, according to the similarity of the nodes between the core nodes sets and the nodes degree, we assign weights to the nodes in the network. So the label of the nodes with great influence will be the priority in the label propagation process, which effectively improves the accuracy of the label propagation. The compactness function between nodes and communities in this paper is based on the nodes influence. It combines the connections between nodes and communities with the degree of the node belongs to its neighbor communities based on calculating the node weight. The function effectively uses the information of nodes and connections in the network. The experimental results show that the proposed algorithm can achieve good results in the artificial network and large-scale real networks compared with the 8 contrast algorithms.
Diagnosis of paediatric HIV infection in a primary health care setting with a clinical algorithm.

PubMed Central

Horwood, C.; Liebeschuetz, S.; Blaauw, D.; Cassol, S.; Qazi, S.

2003-01-01

OBJECTIVE: To determine the validity of an algorithm used by primary care health workers to identify children with symptomatic human immunodeficiency virus (HIV) infection. This HIV algorithm is being implemented in South Africa as part of the Integrated Management of Childhood Illness (IMCI), a strategy that aims to improve childhood morbidity and mortality by improving care at the primary care level. As AIDS is a leading cause of death in children in southern Africa, diagnosis and management of symptomatic HIV infection was added to the existing IMCI algorithm. METHODS: In total, 690 children who attended the outpatients department in a district hospital in South Africa were assessed with the HIV algorithm and by a paediatrician. All children were then tested for HIV viral load. The validity of the algorithm in detecting symptomatic HIV was compared with clinical diagnosis by a paediatrician and the result of an HIV test. Detailed clinical data were used to improve the algorithm. FINDINGS: Overall, 198 (28.7%) enrolled children were infected with HIV. The paediatrician correctly identified 142 (71.7%) children infected with HIV, whereas the IMCI/HIV algorithm identified 111 (56.1%). Odds ratios were calculated to identify predictors of HIV infection and used to develop an improved HIV algorithm that is 67.2% sensitive and 81.5% specific in clinically detecting HIV infection. CONCLUSIONS: Children with symptomatic HIV infection can be identified effectively by primary level health workers through the use of an algorithm. The improved HIV algorithm developed in this study could be used by countries with high prevalences of HIV to enable IMCI practitioners to identify and care for HIV-infected children. PMID:14997238
Chiari malformation Type I surgery in pediatric patients. Part 1: validation of an ICD-9-CM code search algorithm.

PubMed

Ladner, Travis R; Greenberg, Jacob K; Guerrero, Nicole; Olsen, Margaret A; Shannon, Chevis N; Yarbrough, Chester K; Piccirillo, Jay F; Anderson, Richard C E; Feldstein, Neil A; Wellons, John C; Smyth, Matthew D; Park, Tae Sung; Limbrick, David D

2016-05-01

OBJECTIVE Administrative billing data may facilitate large-scale assessments of treatment outcomes for pediatric Chiari malformation Type I (CM-I). Validated International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code algorithms for identifying CM-I surgery are critical prerequisites for such studies but are currently only available for adults. The objective of this study was to validate two ICD-9-CM code algorithms using hospital billing data to identify pediatric patients undergoing CM-I decompression surgery. METHODS The authors retrospectively analyzed the validity of two ICD-9-CM code algorithms for identifying pediatric CM-I decompression surgery performed at 3 academic medical centers between 2001 and 2013. Algorithm 1 included any discharge diagnosis code of 348.4 (CM-I), as well as a procedure code of 01.24 (cranial decompression) or 03.09 (spinal decompression or laminectomy). Algorithm 2 restricted this group to the subset of patients with a primary discharge diagnosis of 348.4. The positive predictive value (PPV) and sensitivity of each algorithm were calculated. RESULTS Among 625 first-time admissions identified by Algorithm 1, the overall PPV for CM-I decompression was 92%. Among the 581 admissions identified by Algorithm 2, the PPV was 97%. The PPV for Algorithm 1 was lower in one center (84%) compared with the other centers (93%-94%), whereas the PPV of Algorithm 2 remained high (96%-98%) across all subgroups. The sensitivity of Algorithms 1 (91%) and 2 (89%) was very good and remained so across subgroups (82%-97%). CONCLUSIONS An ICD-9-CM algorithm requiring a primary diagnosis of CM-I has excellent PPV and very good sensitivity for identifying CM-I decompression surgery in pediatric patients. These results establish a basis for utilizing administrative billing data to assess pediatric CM-I treatment outcomes.

Automatic Thesaurus Generation for an Electronic Community System.

ERIC Educational Resources Information Center

Chen, Hsinchun; And Others

1995-01-01

This research reports an algorithmic approach to the automatic generation of thesauri for electronic community systems. The techniques used include term filtering, automatic indexing, and cluster analysis. The Worm Community System, used by molecular biologists studying the nematode worm C. elegans, was used as the testbed for this research.…
A network approach for identifying and delimiting biogeographical regions.

PubMed

Vilhena, Daril A; Antonelli, Alexandre

2015-04-24

Biogeographical regions (geographically distinct assemblages of species and communities) constitute a cornerstone for ecology, biogeography, evolution and conservation biology. Species turnover measures are often used to quantify spatial biodiversity patterns, but algorithms based on similarity can be sensitive to common sampling biases in species distribution data. Here we apply a community detection approach from network theory that incorporates complex, higher-order presence-absence patterns. We demonstrate the performance of the method by applying it to all amphibian species in the world (c. 6,100 species), all vascular plant species of the USA (c. 17,600) and a hypothetical data set containing a zone of biotic transition. In comparison with current methods, our approach tackles the challenges posed by transition zones and succeeds in retrieving a larger number of commonly recognized biogeographical regions. This method can be applied to generate objective, data-derived identification and delimitation of the world's biogeographical regions.
How artificial intelligence tools can be used to assess individual patient risk in cardiovascular disease: problems with the current methods

PubMed Central

Grossi, Enzo

2006-01-01

Background In recent years a number of algorithms for cardiovascular risk assessment has been proposed to the medical community. These algorithms consider a number of variables and express their results as the percentage risk of developing a major fatal or non-fatal cardiovascular event in the following 10 to 20 years Discussion The author has identified three major pitfalls of these algorithms, linked to the limitation of the classical statistical approach in dealing with this kind of non linear and complex information. The pitfalls are the inability to capture the disease complexity, the inability to capture process dynamics, and the wide confidence interval of individual risk assessment. Artificial Intelligence tools can provide potential advantage in trying to overcome these limitations. The theoretical background and some application examples related to artificial neural networks and fuzzy logic have been reviewed and discussed. Summary The use of predictive algorithms to assess individual absolute risk of cardiovascular future events is currently hampered by methodological and mathematical flaws. The use of newer approaches, such as fuzzy logic and artificial neural networks, linked to artificial intelligence, seems to better address both the challenge of increasing complexity resulting from a correlation between predisposing factors, data on the occurrence of cardiovascular events, and the prediction of future events on an individual level. PMID:16672045
On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs

PubMed Central

Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner

2017-01-01

Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771
An Evaluation of a Natural Language Processing Tool for Identifying and Encoding Allergy Information in Emergency Department Clinical Notes

PubMed Central

Goss, Foster R.; Plasek, Joseph M.; Lau, Jason J.; Seger, Diane L.; Chang, Frank Y.; Zhou, Li

2014-01-01

Emergency department (ED) visits due to allergic reactions are common. Allergy information is often recorded in free-text provider notes; however, this domain has not yet been widely studied by the natural language processing (NLP) community. We developed an allergy module built on the MTERMS NLP system to identify and encode food, drug, and environmental allergies and allergic reactions. The module included updates to our lexicon using standard terminologies, and novel disambiguation algorithms. We developed an annotation schema and annotated 400 ED notes that served as a gold standard for comparison to MTERMS output. MTERMS achieved an F-measure of 87.6% for the detection of allergen names and no known allergies, 90% for identifying true reactions in each allergy statement where true allergens were also identified, and 69% for linking reactions to their allergen. These preliminary results demonstrate the feasibility using NLP to extract and encode allergy information from clinical notes. PMID:25954363
Clustering network layers with the strata multilayer stochastic block model.

PubMed

Stanley, Natalie; Shai, Saray; Taylor, Dane; Mucha, Peter J

2016-01-01

Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisely extract information from a multilayer network, we propose to identify and combine sets of layers with meaningful similarities in community structure. In this paper, we describe the "strata multilayer stochastic block model" (sMLSBM), a probabilistic model for multilayer community structure. The central extension of the model is that there exist groups of layers, called "strata", which are defined such that all layers in a given stratum have community structure described by a common stochastic block model (SBM). That is, layers in a stratum exhibit similar node-to-community assignments and SBM probability parameters. Fitting the sMLSBM to a multilayer network provides a joint clustering that yields node-to-community and layer-to-stratum assignments, which cooperatively aid one another during inference. We describe an algorithm for separating layers into their appropriate strata and an inference technique for estimating the SBM parameters for each stratum. We demonstrate our method using synthetic networks and a multilayer network inferred from data collected in the Human Microbiome Project.
Clustering network layers with the strata multilayer stochastic block model

PubMed Central

Stanley, Natalie; Shai, Saray; Taylor, Dane; Mucha, Peter J.

2016-01-01

Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisely extract information from a multilayer network, we propose to identify and combine sets of layers with meaningful similarities in community structure. In this paper, we describe the “strata multilayer stochastic block model” (sMLSBM), a probabilistic model for multilayer community structure. The central extension of the model is that there exist groups of layers, called “strata”, which are defined such that all layers in a given stratum have community structure described by a common stochastic block model (SBM). That is, layers in a stratum exhibit similar node-to-community assignments and SBM probability parameters. Fitting the sMLSBM to a multilayer network provides a joint clustering that yields node-to-community and layer-to-stratum assignments, which cooperatively aid one another during inference. We describe an algorithm for separating layers into their appropriate strata and an inference technique for estimating the SBM parameters for each stratum. We demonstrate our method using synthetic networks and a multilayer network inferred from data collected in the Human Microbiome Project. PMID:28435844
Two-pass imputation algorithm for missing value estimation in gene expression time series.

PubMed

Tsiporkova, Elena; Boeva, Veselka

2007-10-01

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.
Network clustering and community detection using modulus of families of loops.

PubMed

Shakeri, Heman; Poggi-Corradini, Pietro; Albin, Nathan; Scoglio, Caterina

2017-01-01

We study the structure of loops in networks using the notion of modulus of loop families. We introduce an alternate measure of network clustering by quantifying the richness of families of (simple) loops. Modulus tries to minimize the expected overlap among loops by spreading the expected link usage optimally. We propose weighting networks using these expected link usages to improve classical community detection algorithms. We show that the proposed method enhances the performance of certain algorithms, such as spectral partitioning and modularity maximization heuristics, on standard benchmarks.
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

PubMed

Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset

2017-01-06

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.
Stakeholder-driven geospatial modeling for assessing tsunami vertical-evacuation strategies in the U.S. Pacific Northwest

NASA Astrophysics Data System (ADS)

Wood, N. J.; Schmidtlein, M.; Schelling, J.; Jones, J.; Ng, P.

2012-12-01

Recent tsunami disasters, such as the 2010 Chilean and 2011 Tohoku events, demonstrate the significant life loss that can occur from tsunamis. Many coastal communities in the world are threatened by near-field tsunami hazards that may inundate low-lying areas only minutes after a tsunami begins. Geospatial integration of demographic data and hazard zones has identified potential impacts on populations in communities susceptible to near-field tsunami threats. Pedestrian-evacuation models build on these geospatial analyses to determine if individuals in tsunami-prone areas will have sufficient time to reach high ground before tsunami-wave arrival. Areas where successful evacuations are unlikely may warrant vertical-evacuation (VE) strategies, such as berms or structures designed to aid evacuation. The decision of whether and where VE strategies are warranted is complex. Such decisions require an interdisciplinary understanding of tsunami hazards, land cover conditions, demography, community vulnerability, pedestrian-evacuation models, land-use and emergency-management policy, and decision science. Engagement with the at-risk population and local emergency managers in VE planning discussions is critical because resulting strategies include permanent structures within a community and their local ownership helps ensure long-term success. We present a summary of an interdisciplinary approach to assess VE options in communities along the southwest Washington coast (U.S.A.) that are threatened by near-field tsunami hazards generated by Cascadia subduction zone earthquakes. Pedestrian-evacuation models based on an anisotropic approach that uses path-distance algorithms were merged with population data to forecast the distribution of at-risk individuals within several communities as a function of travel time to safe locations. A series of community-based workshops helped identify potential VE options in these communities, collectively known as "Project Safe Haven" at the State of Washington Emergency Management Division. Models of the influence of stakeholder-driven VE options identified changes in the type and distribution of at-risk individuals. Insights from VE use and performance as an aid to evacuations from the 2011 Tohoku tsunami helped to inform the meetings and the analysis. We developed geospatial tools to automate parts of the pedestrian-evacuation models to support the iterative process of developing VE options and forecasting changes in population exposure. Our summary presents the interdisciplinary effort to forecast population impacts from near-field tsunami threats and to develop effective VE strategies to minimize fatalities in future events.
Community Landscapes: An Integrative Approach to Determine Overlapping Network Module Hierarchy, Identify Key Nodes and Predict Network Dynamics

PubMed Central

Kovács, István A.; Palotai, Robin; Szalay, Máté S.; Csermely, Peter

2010-01-01

Background Network communities help the functional organization and evolution of complex networks. However, the development of a method, which is both fast and accurate, provides modular overlaps and partitions of a heterogeneous network, has proven to be rather difficult. Methodology/Principal Findings Here we introduce the novel concept of ModuLand, an integrative method family determining overlapping network modules as hills of an influence function-based, centrality-type community landscape, and including several widely used modularization methods as special cases. As various adaptations of the method family, we developed several algorithms, which provide an efficient analysis of weighted and directed networks, and (1) determine pervasively overlapping modules with high resolution; (2) uncover a detailed hierarchical network structure allowing an efficient, zoom-in analysis of large networks; (3) allow the determination of key network nodes and (4) help to predict network dynamics. Conclusions/Significance The concept opens a wide range of possibilities to develop new approaches and applications including network routing, classification, comparison and prediction. PMID:20824084
The ESA Cloud CCI project: Generation of Multi Sensor consistent Cloud Properties with an Optimal Estimation Based Retrieval Algorithm

NASA Astrophysics Data System (ADS)

Jerg, M.; Stengel, M.; Hollmann, R.; Poulsen, C.

2012-04-01

The ultimate objective of the ESA Climate Change Initiative (CCI) Cloud project is to provide long-term coherent cloud property data sets exploiting and improving on the synergetic capabilities of past, existing, and upcoming European and American satellite missions. The synergetic approach allows not only for improved accuracy and extended temporal and spatial sampling of retrieved cloud properties better than those provided by single instruments alone but potentially also for improved (inter-)calibration and enhanced homogeneity and stability of the derived time series. Such advances are required by the scientific community to facilitate further progress in satellite-based climate monitoring, which leads to a better understanding of climate. Some of the primary objectives of ESA Cloud CCI Cloud are (1) the development of inter-calibrated radiance data sets, so called Fundamental Climate Data Records - for ESA and non ESA instruments through an international collaboration, (2) the development of an optimal estimation based retrieval framework for cloud related essential climate variables like cloud cover, cloud top height and temperature, liquid and ice water path, and (3) the development of two multi-annual global data sets for the mentioned cloud properties including uncertainty estimates. These two data sets are characterized by different combinations of satellite systems: the AVHRR heritage product comprising (A)ATSR, AVHRR and MODIS and the novel (A)ATSR - MERIS product which is based on a synergetic retrieval using both instruments. Both datasets cover the years 2007-2009 in the first project phase. ESA Cloud CCI will also carry out a comprehensive validation of the cloud property products and provide a common data base as in the framework of the Global Energy and Water Cycle Experiment (GEWEX). The presentation will give an overview of the ESA Cloud CCI project and its goals and approaches and then continue with results from the Round Robin algorithm comparison exercise carried out at the beginning of the project which included three algorithms. The purpose of the exercise was to assess and compare existing cloud retrieval algorithms in order to chose one of them as backbone of the retrieval system and also identify areas of potential improvement and general strengths and weaknesses of the algorithm. Furthermore the presentation will elaborate on the optimal estimation algorithm subsequently chosen to derive the heritage product and which is presently further developed and will be employed for the AVHRR heritage product. The algorithm's capabilities to coherently and simultaneously process all radiative input and yield retrieval parameters together with associated uncertainty estimates will be presented together with first results for the heritage product. In the course of the project the algorithm is being developed into a freely and publicly available community retrieval system for interested scientists.
A systematic review of validated methods for identifying hypersensitivity reactions other than anaphylaxis (fever, rash, and lymphadenopathy), using administrative and claims data.

PubMed

Schneider, Gary; Kachroo, Sumesh; Jones, Natalie; Crean, Sheila; Rotella, Philip; Avetisyan, Ruzan; Reynolds, Matthew W

2012-01-01

The Food and Drug Administration's Mini-Sentinel pilot program aims to conduct active surveillance to refine safety signals that emerge for marketed medical products. A key facet of this surveillance is to develop and understand the validity of algorithms for identifying health outcomes of interest from administrative and claims data. This article summarizes the process and findings of the algorithm review of hypersensitivity reactions. PubMed and Iowa Drug Information Service searches were conducted to identify citations applicable to the hypersensitivity reactions of health outcomes of interest. Level 1 abstract reviews and Level 2 full-text reviews were conducted to find articles using administrative and claims data to identify hypersensitivity reactions and including validation estimates of the coding algorithms. We identified five studies that provided validated hypersensitivity-reaction algorithms. Algorithm positive predictive values (PPVs) for various definitions of hypersensitivity reactions ranged from 3% to 95%. PPVs were high (i.e. 90%-95%) when both exposures and diagnoses were very specific. PPV generally decreased when the definition of hypersensitivity was expanded, except in one study that used data mining methodology for algorithm development. The ability of coding algorithms to identify hypersensitivity reactions varied, with decreasing performance occurring with expanded outcome definitions. This examination of hypersensitivity-reaction coding algorithms provides an example of surveillance bias resulting from outcome definitions that include mild cases. Data mining may provide tools for algorithm development for hypersensitivity and other health outcomes. Research needs to be conducted on designing validation studies to test hypersensitivity-reaction algorithms and estimating their predictive power, sensitivity, and specificity. Copyright © 2012 John Wiley & Sons, Ltd.
Bambus 2: scaffolding metagenomes.

PubMed

Koren, Sergey; Treangen, Todd J; Pop, Mihai

2011-11-01

Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Bambus 2 is open source and available from http://amos.sf.net. mpop@umiacs.umd.edu. Supplementary data are available at Bioinformatics online.
Bambus 2: scaffolding metagenomes

PubMed Central

Koren, Sergey; Treangen, Todd J.; Pop, Mihai

2011-01-01

Motivation: Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. Results: We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. Contact: mpop@umiacs.umd.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21926123
WE-F-201-03: Evaluate Clinical Cases Using Commercially Available Systems and Compare to TG-43 Dosimetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beaulieu, L.

With the recent introduction of heterogeneity correction algorithms for brachytherapy, the AAPM community is still unclear on how to commission and implement these into clinical practice. The recently-published AAPM TG-186 report discusses important issues for clinical implementation of these algorithms. A charge of the AAPM-ESTRO-ABG Working Group on MBDCA in Brachytherapy (WGMBDCA) is the development of a set of well-defined test case plans, available as references in the software commissioning process to be performed by clinical end-users. In this practical medical physics course, specific examples on how to perform the commissioning process are presented, as well as descriptions of themore » clinical impact from recent literature reporting comparisons of TG-43 and heterogeneity-based dosimetry. Learning Objectives: Identify key clinical applications needing advanced dose calculation in brachytherapy. Review TG-186 and WGMBDCA guidelines, commission process, and dosimetry benchmarks. Evaluate clinical cases using commercially available systems and compare to TG-43 dosimetry.« less
Improvement of the SEP protocol based on community structure of node degree

NASA Astrophysics Data System (ADS)

Li, Donglin; Wei, Suyuan

2017-05-01

Analyzing the Stable election protocol (SEP) in wireless sensor networks and aiming at the problem of inhomogeneous cluster-heads distribution and unreasonable cluster-heads selectivity and single hop transmission in the SEP, a SEP Protocol based on community structure of node degree (SEP-CSND) is proposed. In this algorithm, network node deployed by using grid deployment model, and the connection between nodes established by setting up the communication threshold. The community structure constructed by node degree, then cluster head is elected in the community structure. On the basis of SEP, the node's residual energy and node degree is added in cluster-heads election. The information is transmitted with mode of multiple hops between network nodes. The simulation experiments showed that compared to the classical LEACH and SEP, this algorithm balances the energy consumption of the entire network and significantly prolongs network lifetime.
Behavior Based Social Dimensions Extraction for Multi-Label Classification

PubMed Central

Li, Le; Xu, Junyi; Xiao, Weidong; Ge, Bin

2016-01-01

Classification based on social dimensions is commonly used to handle the multi-label classification task in heterogeneous networks. However, traditional methods, which mostly rely on the community detection algorithms to extract the latent social dimensions, produce unsatisfactory performance when community detection algorithms fail. In this paper, we propose a novel behavior based social dimensions extraction method to improve the classification performance in multi-label heterogeneous networks. In our method, nodes’ behavior features, instead of community memberships, are used to extract social dimensions. By introducing Latent Dirichlet Allocation (LDA) to model the network generation process, nodes’ connection behaviors with different communities can be extracted accurately, which are applied as latent social dimensions for classification. Experiments on various public datasets reveal that the proposed method can obtain satisfactory classification results in comparison to other state-of-the-art methods on smaller social dimensions. PMID:27049849
An algorithm for modularization of MAPK and calcium signaling pathways: comparative analysis among different species.

PubMed

Nayak, Losiana; De, Rajat K

2007-12-01

Signaling pathways are large complex biochemical networks. It is difficult to analyze the underlying mechanism of such networks as a whole. In the present article, we have proposed an algorithm for modularization of signal transduction pathways. Unlike studying a signaling pathway as a whole, this enables one to study the individual modules (less complex smaller units) easily and hence to study the entire pathway better. A comparative study of modules belonging to different species (for the same signaling pathway) has been made, which gives an overall idea about development of the signaling pathways over the taken set of species of calcium and MAPK signaling pathways. The superior performance, in terms of biological significance, of the proposed algorithm over an existing community finding algorithm of Newman [Newman MEJ. Modularity and community structure in networks. Proc Natl Acad Sci USA 2006;103(23):8577-82] has been demonstrated using the aforesaid pathways of H. sapiens.

The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

PubMed

Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

2008-10-01

Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.
Use of chronic disease management algorithms in Australian community pharmacies.

PubMed

Morrissey, Hana; Ball, Patrick; Jackson, David; Pilloto, Louis; Nielsen, Sharon

2015-01-01

In Australia, standardized chronic disease management algorithms are available for medical practitioners, nursing practitioners and nurses through a range of sources including prescribing software, manuals and through government and not-for-profit non-government organizations. There is currently no standardized algorithm for pharmacist intervention in the management of chronic diseases.. To investigate if a collaborative community pharmacists and doctors' model of care in chronic disease management could improve patients' outcomes through ongoing monitoring of disease biochemical markers, robust self-management skills and better medication adherence. This project was a pilot pragmatic study, measuring the effect of the intervention by comparing the baseline and the end of the study patient health outcomes, to support future definitive studies. Algorithms for selected chronic conditions were designed, based on the World Health Organisation STEPS™ process and Central Australia Rural Practitioners' Association Standard Treatment Manual. They were evaluated in community pharmacies in 8 inland Australian small towns, mostly having only one pharmacy in order to avoid competition issues. The algorithms were reviewed by Murrumbidgee Medicare Local Ltd, New South Wales, Australia, Quality use of Medicines committee. They constitute a pharmacist-driven, doctor/pharmacist collaboration primary care model. The Pharmacy owners volunteered to take part in the study and patients were purposefully recruited by in-store invitation. Six out of 9 sites' pharmacists (67%) were fully capable of delivering the algorithm (each site had 3 pharmacists), one site (11%) with 2 pharmacists, found it too difficult and withdrew from the study, and 2 sites (22%, with one pharmacist at each site) stated that they were personally capable of delivering the algorithm but unable to do so due to workflow demands. This primary care model can form the basis of workable collaboration between doctors and pharmacists ensuring continuity of care for patients. It has potential for rural and remote areas of Australia where this continuity of care may be problematic. Copyright © 2015 Elsevier Inc. All rights reserved.
New algorithms for identifying the flavour of [Formula: see text] mesons using pions and protons.

PubMed

Aaij, R; Adeva, B; Adinolfi, M; Ajaltouni, Z; Akar, S; Albrecht, J; Alessio, F; Alexander, M; Ali, S; Alkhazov, G; Alvarez Cartelle, P; Alves, A A; Amato, S; Amerio, S; Amhis, Y; An, L; Anderlini, L; Andreassi, G; Andreotti, M; Andrews, J E; Appleby, R B; Archilli, F; d'Argent, P; Arnau Romeu, J; Artamonov, A; Artuso, M; Aslanides, E; Auriemma, G; Baalouch, M; Babuschkin, I; Bachmann, S; Back, J J; Badalov, A; Baesso, C; Baker, S; Baldini, W; Barlow, R J; Barschel, C; Barsuk, S; Barter, W; Baszczyk, M; Batozskaya, V; Batsukh, B; Battista, V; Bay, A; Beaucourt, L; Beddow, J; Bedeschi, F; Bediaga, I; Bel, L J; Bellee, V; Belloli, N; Belous, K; Belyaev, I; Ben-Haim, E; Bencivenni, G; Benson, S; Benton, J; Berezhnoy, A; Bernet, R; Bertolin, A; Betti, F; Bettler, M-O; van Beuzekom, M; Bezshyiko, Ia; Bifani, S; Billoir, P; Bird, T; Birnkraut, A; Bitadze, A; Bizzeti, A; Blake, T; Blanc, F; Blouw, J; Blusk, S; Bocci, V; Boettcher, T; Bondar, A; Bondar, N; Bonivento, W; Bordyuzhin, I; Borgheresi, A; Borghi, S; Borisyak, M; Borsato, M; Bossu, F; Boubdir, M; Bowcock, T J V; Bowen, E; Bozzi, C; Braun, S; Britsch, M; Britton, T; Brodzicka, J; Buchanan, E; Burr, C; Bursche, A; Buytaert, J; Cadeddu, S; Calabrese, R; Calvi, M; Calvo Gomez, M; Camboni, A; Campana, P; Campora Perez, D; Campora Perez, D H; Capriotti, L; Carbone, A; Carboni, G; Cardinale, R; Cardini, A; Carniti, P; Carson, L; Carvalho Akiba, K; Casse, G; Cassina, L; Castillo Garcia, L; Cattaneo, M; Cauet, Ch; Cavallero, G; Cenci, R; Charles, M; Charpentier, Ph; Chatzikonstantinidis, G; Chefdeville, M; Chen, S; Cheung, S F; Chobanova, V; Chrzaszcz, M; Cid Vidal, X; Ciezarek, G; Clarke, P E L; Clemencic, M; Cliff, H V; Closier, J; Coco, V; Cogan, J; Cogneras, E; Cogoni, V; Cojocariu, L; Collins, P; Comerma-Montells, A; Contu, A; Cook, A; Coombs, G; Coquereau, S; Corti, G; Corvo, M; Costa Sobral, C M; Couturier, B; Cowan, G A; Craik, D C; Crocombe, A; Cruz Torres, M; Cunliffe, S; Currie, R; D'Ambrosio, C; Da Cunha Marinho, F; Dall'Occo, E; Dalseno, J; David, P N Y; Davis, A; De Aguiar Francisco, O; De Bruyn, K; De Capua, S; De Cian, M; De Miranda, J M; De Paula, L; De Serio, M; De Simone, P; Dean, C T; Decamp, D; Deckenhoff, M; Del Buono, L; Demmer, M; Dendek, A; Derkach, D; Deschamps, O; Dettori, F; Dey, B; Di Canto, A; Dijkstra, H; Dordei, F; Dorigo, M; Dosil Suárez, A; Dovbnya, A; Dreimanis, K; Dufour, L; Dujany, G; Dungs, K; Durante, P; Dzhelyadin, R; Dziurda, A; Dzyuba, A; Déléage, N; Easo, S; Ebert, M; Egede, U; Egorychev, V; Eidelman, S; Eisenhardt, S; Eitschberger, U; Ekelhof, R; Eklund, L; Elsasser, Ch; Ely, S; Esen, S; Evans, H M; Evans, T; Falabella, A; Farley, N; Farry, S; Fay, R; Fazzini, D; Ferguson, D; Fernandez Prieto, A; Ferrari, F; Ferreira Rodrigues, F; Ferro-Luzzi, M; Filippov, S; Fini, R A; Fiore, M; Fiorini, M; Firlej, M; Fitzpatrick, C; Fiutowski, T; Fleuret, F; Fohl, K; Fontana, M; Fontanelli, F; Forshaw, D C; Forty, R; Franco Lima, V; Frank, M; Frei, C; Fu, J; Furfaro, E; Färber, C; Gallas Torreira, A; Galli, D; Gallorini, S; Gambetta, S; Gandelman, M; Gandini, P; Gao, Y; Garcia Martin, L M; García Pardiñas, J; Garra Tico, J; Garrido, L; Garsed, P J; Gascon, D; Gaspar, C; Gavardi, L; Gazzoni, G; Gerick, D; Gersabeck, E; Gersabeck, M; Gershon, T; Ghez, Ph; Gianì, S; Gibson, V; Girard, O G; Giubega, L; Gizdov, K; Gligorov, V V; Golubkov, D; Golutvin, A; Gomes, A; Gorelov, I V; Gotti, C; Grabalosa Gándara, M; Graciani Diaz, R; Granado Cardoso, L A; Graugés, E; Graverini, E; Graziani, G; Grecu, A; Griffith, P; Grillo, L; Gruberg Cazon, B R; Grünberg, O; Gushchin, E; Guz, Yu; Gys, T; Göbel, C; Hadavizadeh, T; Hadjivasiliou, C; Haefeli, G; Haen, C; Haines, S C; Hall, S; Hamilton, B; Han, X; Hansmann-Menzemer, S; Harnew, N; Harnew, S T; Harrison, J; Hatch, M; He, J; Head, T; Heister, A; Hennessy, K; Henrard, P; Henry, L; Hernando Morata, J A; van Herwijnen, E; Heß, M; Hicheur, A; Hill, D; Hombach, C; Hopchev, P H; Hulsbergen, W; Humair, T; Hushchyn, M; Hussain, N; Hutchcroft, D; Idzik, M; Ilten, P; Jacobsson, R; Jaeger, A; Jalocha, J; Jans, E; Jawahery, A; Jiang, F; John, M; Johnson, D; Jones, C R; Joram, C; Jost, B; Jurik, N; Kandybei, S; Kanso, W; Karacson, M; Kariuki, J M; Karodia, S; Kecke, M; Kelsey, M; Kenyon, I R; Kenzie, M; Ketel, T; Khairullin, E; Khanji, B; Khurewathanakul, C; Kirn, T; Klaver, S; Klimaszewski, K; Koliiev, S; Kolpin, M; Komarov, I; Koopman, R F; Koppenburg, P; Kosmyntseva, A; Kozeiha, M; Kravchuk, L; Kreplin, K; Kreps, M; Krokovny, P; Kruse, F; Krzemien, W; Kucewicz, W; Kucharczyk, M; Kudryavtsev, V; Kuonen, A K; Kurek, K; Kvaratskheliya, T; Lacarrere, D; Lafferty, G; Lai, A; Lambert, D; Lanfranchi, G; Langenbruch, C; Latham, T; Lazzeroni, C; Le Gac, R; van Leerdam, J; Lees, J-P; Leflat, A; Lefrançois, J; Lefèvre, R; Lemaitre, F; Lemos Cid, E; Leroy, O; Lesiak, T; Leverington, B; Li, Y; Likhomanenko, T; Lindner, R; Linn, C; Lionetto, F; Liu, B; Liu, X; Loh, D; Longstaff, I; Lopes, J H; Lucchesi, D; Lucio Martinez, M; Luo, H; Lupato, A; Luppi, E; Lupton, O; Lusiani, A; Lyu, X; Machefert, F; Maciuc, F; Maev, O; Maguire, K; Malde, S; Malinin, A; Maltsev, T; Manca, G; Mancinelli, G; Manning, P; Maratas, J; Marchand, J F; Marconi, U; Marin Benito, C; Marino, P; Marks, J; Martellotti, G; Martin, M; Martinelli, M; Martinez Santos, D; Martinez Vidal, F; Martins Tostes, D; Massacrier, L M; Massafferri, A; Matev, R; Mathad, A; Mathe, Z; Matteuzzi, C; Mauri, A; Maurin, B; Mazurov, A; McCann, M; McCarthy, J; McNab, A; McNulty, R; Meadows, B; Meier, F; Meissner, M; Melnychuk, D; Merk, M; Merli, A; Michielin, E; Milanes, D A; Minard, M-N; Mitzel, D S; Mogini, A; Molina Rodriguez, J; Monroy, I A; Monteil, S; Morandin, M; Morawski, P; Mordà, A; Morello, M J; Moron, J; Morris, A B; Mountain, R; Muheim, F; Mulder, M; Mussini, M; Müller, D; Müller, J; Müller, K; Müller, V; Naik, P; Nakada, T; Nandakumar, R; Nandi, A; Nasteva, I; Needham, M; Neri, N; Neubert, S; Neufeld, N; Neuner, M; Nguyen, A D; Nguyen, T D; Nguyen-Mau, C; Nieswand, S; Niet, R; Nikitin, N; Nikodem, T; Novoselov, A; O'Hanlon, D P; Oblakowska-Mucha, A; Obraztsov, V; Ogilvy, S; Oldeman, R; Onderwater, C J G; Otalora Goicochea, J M; Otto, A; Owen, P; Oyanguren, A; Pais, P R; Palano, A; Palombo, F; Palutan, M; Panman, J; Papanestis, A; Pappagallo, M; Pappalardo, L L; Parker, W; Parkes, C; Passaleva, G; Pastore, A; Patel, G D; Patel, M; Patrignani, C; Pearce, A; Pellegrino, A; Penso, G; Pepe Altarelli, M; Perazzini, S; Perret, P; Pescatore, L; Petridis, K; Petrolini, A; Petrov, A; Petruzzo, M; Picatoste Olloqui, E; Pietrzyk, B; Pikies, M; Pinci, D; Pistone, A; Piucci, A; Playfer, S; Plo Casasus, M; Poikela, T; Polci, F; Poluektov, A; Polyakov, I; Polycarpo, E; Pomery, G J; Popov, A; Popov, D; Popovici, B; Poslavskii, S; Potterat, C; Price, E; Price, J D; Prisciandaro, J; Pritchard, A; Prouve, C; Pugatch, V; Puig Navarro, A; Punzi, G; Qian, W; Quagliani, R; Rachwal, B; Rademacker, J H; Rama, M; Ramos Pernas, M; Rangel, M S; Raniuk, I; Ratnikov, F; Raven, G; Redi, F; Reichert, S; Dos Reis, A C; Remon Alepuz, C; Renaudin, V; Ricciardi, S; Richards, S; Rihl, M; Rinnert, K; Rives Molina, V; Robbe, P; Rodrigues, A B; Rodrigues, E; Rodriguez Lopez, J A; Rodriguez Perez, P; Rogozhnikov, A; Roiser, S; Rollings, A; Romanovskiy, V; Romero Vidal, A; Ronayne, J W; Rotondo, M; Rudolph, M S; Ruf, T; Ruiz Valls, P; Saborido Silva, J J; Sadykhov, E; Sagidova, N; Saitta, B; Salustino Guimaraes, V; Sanchez Mayordomo, C; Sanmartin Sedes, B; Santacesaria, R; Santamarina Rios, C; Santimaria, M; Santovetti, E; Sarti, A; Satriano, C; Satta, A; Saunders, D M; Savrina, D; Schael, S; Schellenberg, M; Schiller, M; Schindler, H; Schlupp, M; Schmelling, M; Schmelzer, T; Schmidt, B; Schneider, O; Schopper, A; Schubert, K; Schubiger, M; Schune, M-H; Schwemmer, R; Sciascia, B; Sciubba, A; Semennikov, A; Sergi, A; Serra, N; Serrano, J; Sestini, L; Seyfert, P; Shapkin, M; Shapoval, I; Shcheglov, Y; Shears, T; Shekhtman, L; Shevchenko, V; Shires, A; Siddi, B G; Silva Coutinho, R; Silva de Oliveira, L; Simi, G; Simone, S; Sirendi, M; Skidmore, N; Skwarnicki, T; Smith, E; Smith, I T; Smith, J; Smith, M; Snoek, H; Sokoloff, M D; Soler, F J P; Souza De Paula, B; Spaan, B; Spradlin, P; Sridharan, S; Stagni, F; Stahl, M; Stahl, S; Stefko, P; Stefkova, S; Steinkamp, O; Stemmle, S; Stenyakin, O; Stevenson, S; Stoica, S; Stone, S; Storaci, B; Stracka, S; Straticiuc, M; Straumann, U; Sun, L; Sutcliffe, W; Swientek, K; Syropoulos, V; Szczekowski, M; Szumlak, T; T'Jampens, S; Tayduganov, A; Tekampe, T; Teklishyn, M; Tellarini, G; Teubert, F; Thomas, E; van Tilburg, J; Tilley, M J; Tisserand, V; Tobin, M; Tolk, S; Tomassetti, L; Tonelli, D; Topp-Joergensen, S; Toriello, F; Tournefier, E; Tourneur, S; Trabelsi, K; Traill, M; Tran, M T; Tresch, M; Trisovic, A; Tsaregorodtsev, A; Tsopelas, P; Tully, A; Tuning, N; Ukleja, A; Ustyuzhanin, A; Uwer, U; Vacca, C; Vagnoni, V; Valassi, A; Valat, S; Valenti, G; Vallier, A; Vazquez Gomez, R; Vazquez Regueiro, P; Vecchi, S; van Veghel, M; Velthuis, J J; Veltri, M; Veneziano, G; Venkateswaran, A; Vernet, M; Vesterinen, M; Viaud, B; Vieira, D; Vieites Diaz, M; Vilasis-Cardona, X; Volkov, V; Vollhardt, A; Voneki, B; Vorobyev, A; Vorobyev, V; Voß, C; de Vries, J A; Vázquez Sierra, C; Waldi, R; Wallace, C; Wallace, R; Walsh, J; Wang, J; Ward, D R; Wark, H M; Watson, N K; Websdale, D; Weiden, A; Whitehead, M; Wicht, J; Wilkinson, G; Wilkinson, M; Williams, M; Williams, M P; Williams, M; Williams, T; Wilson, F F; Wimberley, J; Wishahi, J; Wislicki, W; Witek, M; Wormser, G; Wotton, S A; Wraight, K; Wyllie, K; Xie, Y; Xu, Z; Yang, Z; Yin, H; Yu, J; Yuan, X; Yushchenko, O; Zarebski, K A; Zavertyaev, M; Zhang, L; Zhang, Y; Zhelezov, A; Zheng, Y; Zhokhov, A; Zhu, X; Zhukov, V; Zucchelli, S

2017-01-01

Two new algorithms for use in the analysis of [Formula: see text] collision are developed to identify the flavour of [Formula: see text] mesons at production using pions and protons from the hadronization process. The algorithms are optimized and calibrated on data, using [Formula: see text] decays from [Formula: see text] collision data collected by LHCb at centre-of-mass energies of 7 and 8 TeV . The tagging power of the new pion algorithm is 60% greater than the previously available one; the algorithm using protons to identify the flavour of a [Formula: see text] meson is the first of its kind.
Validation of two algorithms for managing children with a non-blanching rash.

PubMed

Riordan, F Andrew I; Jones, Laura; Clark, Julia

2016-08-01

Paediatricians are concerned that children who present with a non-blanching rash (NBR) may have meningococcal disease (MCD). Two algorithms have been devised to help identify which children with an NBR have MCD. To evaluate the NBR algorithms' ability to identify children with MCD. The Newcastle-Birmingham-Liverpool (NBL) algorithm was applied retrospectively to three cohorts of children who had presented with NBRs. This algorithm was also piloted in four hospitals, and then used prospectively for 12 months in one hospital. The National Institute for Health and Care Excellence (NICE) algorithm was validated retrospectively using data from all cohorts. The cohorts included 625 children, 145 (23%) of whom had confirmed or probable MCD. Paediatricians empirically treated 324 (52%) children with antibiotics. The NBL algorithm identified all children with MCD and suggested treatment for a further 86 children (sensitivity 100%, specificity 82%). One child with MCD did not receive immediate antibiotic treatment, despite this being suggested by the algorithm. The NICE algorithm suggested 382 children (61%) who should be treated with antibiotics. This included 141 of the 145 children with MCD (sensitivity 97%, specificity 50%). These algorithms may help paediatricians identify children with MCD who present with NBRs. The NBL algorithm may be more specific than the NICE algorithm as it includes fewer features suggesting MCD. The only significant delay in treatment of MCD occurred when the algorithms were not followed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
A Systematic Review of Validated Methods for Identifying Cerebrovascular Accident or Transient Ischemic Attack Using Administrative Data

PubMed Central

Andrade, Susan E.; Harrold, Leslie R.; Tjia, Jennifer; Cutrona, Sarah L.; Saczynski, Jane S.; Dodd, Katherine S.; Goldberg, Robert J.; Gurwitz, Jerry H.

2012-01-01

Purpose To perform a systematic review of the validity of algorithms for identifying cerebrovascular accidents (CVAs) or transient ischemic attacks (TIAs) using administrative and claims data. Methods PubMed and Iowa Drug Information Service (IDIS) searches of the English language literature were performed to identify studies published between 1990 and 2010 that evaluated the validity of algorithms for identifying CVAs (ischemic and hemorrhagic strokes, intracranial hemorrhage and subarachnoid hemorrhage) and/or TIAs in administrative data. Two study investigators independently reviewed the abstracts and articles to determine relevant studies according to pre-specified criteria. Results A total of 35 articles met the criteria for evaluation. Of these, 26 articles provided data to evaluate the validity of stroke, 7 reported the validity of TIA, 5 reported the validity of intracranial bleeds (intracerebral hemorrhage and subarachnoid hemorrhage), and 10 studies reported the validity of algorithms to identify the composite endpoints of stroke/TIA or cerebrovascular disease. Positive predictive values (PPVs) varied depending on the specific outcomes and algorithms evaluated. Specific algorithms to evaluate the presence of stroke and intracranial bleeds were found to have high PPVs (80% or greater). Algorithms to evaluate TIAs in adult populations were generally found to have PPVs of 70% or greater. Conclusions The algorithms and definitions to identify CVAs and TIAs using administrative and claims data differ greatly in the published literature. The choice of the algorithm employed should be determined by the stroke subtype of interest. PMID:22262598
Automatable algorithms to identify nonmedical opioid use using electronic data: a systematic review.

PubMed

Canan, Chelsea; Polinski, Jennifer M; Alexander, G Caleb; Kowal, Mary K; Brennan, Troyen A; Shrank, William H

2017-11-01

Improved methods to identify nonmedical opioid use can help direct health care resources to individuals who need them. Automated algorithms that use large databases of electronic health care claims or records for surveillance are a potential means to achieve this goal. In this systematic review, we reviewed the utility, attempts at validation, and application of such algorithms to detect nonmedical opioid use. We searched PubMed and Embase for articles describing automatable algorithms that used electronic health care claims or records to identify patients or prescribers with likely nonmedical opioid use. We assessed algorithm development, validation, and performance characteristics and the settings where they were applied. Study variability precluded a meta-analysis. Of 15 included algorithms, 10 targeted patients, 2 targeted providers, 2 targeted both, and 1 identified medications with high abuse potential. Most patient-focused algorithms (67%) used prescription drug claims and/or medical claims, with diagnosis codes of substance abuse and/or dependence as the reference standard. Eleven algorithms were developed via regression modeling. Four used natural language processing, data mining, audit analysis, or factor analysis. Automated algorithms can facilitate population-level surveillance. However, there is no true gold standard for determining nonmedical opioid use. Users must recognize the implications of identifying false positives and, conversely, false negatives. Few algorithms have been applied in real-world settings. Automated algorithms may facilitate identification of patients and/or providers most likely to need more intensive screening and/or intervention for nonmedical opioid use. Additional implementation research in real-world settings would clarify their utility. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Variation in Strategy Use Across Measures of Verbal Working Memory

PubMed Central

Morrison, Alexandra B; Rosenbaum, Gail M.; Fair, Damien; Chein, Jason M.

2016-01-01

The working memory (WM) literature contains a number of tasks that vary on dimensions such as when or how memory items are reported. In addition to the ways in which WM tasks are designed to differ, tasks may also diverge according to the strategies participants use during task performance. The present study included seven tasks from the WM literature, each requiring short-term retention of verbal items. Following completion of a small number of trials from each task, individuals completed a self-report questionnaire to identify their primary strategy. Results indicated substantial variation across individuals for a given task, and within the same individual across tasks. Moreover, while direct comparisons between tasks showed that some tasks evinced similar patterns of strategy use despite differing task demands, others showed markedly different patterns of self-reported strategy use. A community detection algorithm aimed at identifying groups of individuals based on their profile of strategic choices revealed unique communities of individuals who are dependent on specific strategies under varying demands. Together, the findings suggest that researchers using common working memory paradigms should very carefully consider the implications of variation in strategy use when interpreting their findings. PMID:27038310
Identifying Psoriasis and Psoriatic Arthritis Patients in Retrospective Databases When Diagnosis Codes Are Not Available: A Validation Study Comparing Medication/Prescriber Visit-Based Algorithms with Diagnosis Codes.

PubMed

Dobson-Belaire, Wendy; Goodfield, Jason; Borrelli, Richard; Liu, Fei Fei; Khan, Zeba M

2018-01-01

Using diagnosis code-based algorithms is the primary method of identifying patient cohorts for retrospective studies; nevertheless, many databases lack reliable diagnosis code information. To develop precise algorithms based on medication claims/prescriber visits (MCs/PVs) to identify psoriasis (PsO) patients and psoriatic patients with arthritic conditions (PsO-AC), a proxy for psoriatic arthritis, in Canadian databases lacking diagnosis codes. Algorithms were developed using medications with narrow indication profiles in combination with prescriber specialty to define PsO and PsO-AC. For a 3-year study period from July 1, 2009, algorithms were validated using the PharMetrics Plus database, which contains both adjudicated medication claims and diagnosis codes. Positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity of the developed algorithms were assessed using diagnosis code as the reference standard. Chosen algorithms were then applied to Canadian drug databases to profile the algorithm-identified PsO and PsO-AC cohorts. In the selected database, 183,328 patients were identified for validation. The highest PPVs for PsO (85%) and PsO-AC (65%) occurred when a predictive algorithm of two or more MCs/PVs was compared with the reference standard of one or more diagnosis codes. NPV and specificity were high (99%-100%), whereas sensitivity was low (≤30%). Reducing the number of MCs/PVs or increasing diagnosis claims decreased the algorithms' PPVs. We have developed an MC/PV-based algorithm to identify PsO patients with a high degree of accuracy, but accuracy for PsO-AC requires further investigation. Such methods allow researchers to conduct retrospective studies in databases in which diagnosis codes are absent. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Sinogram-based adaptive iterative reconstruction for sparse view x-ray computed tomography

NASA Astrophysics Data System (ADS)

Trinca, D.; Zhong, Y.; Wang, Y.-Z.; Mamyrbayev, T.; Libin, E.

2016-10-01

With the availability of more powerful computing processors, iterative reconstruction algorithms have recently been successfully implemented as an approach to achieving significant dose reduction in X-ray CT. In this paper, we propose an adaptive iterative reconstruction algorithm for X-ray CT, that is shown to provide results comparable to those obtained by proprietary algorithms, both in terms of reconstruction accuracy and execution time. The proposed algorithm is thus provided for free to the scientific community, for regular use, and for possible further optimization.
Bio-inspired optimization algorithms for optical parameter extraction of dielectric materials: A comparative study

NASA Astrophysics Data System (ADS)

Ghulam Saber, Md; Arif Shahriar, Kh; Ahmed, Ashik; Hasan Sagor, Rakibul

2016-10-01

Particle swarm optimization (PSO) and invasive weed optimization (IWO) algorithms are used for extracting the modeling parameters of materials useful for optics and photonics research community. These two bio-inspired algorithms are used here for the first time in this particular field to the best of our knowledge. The algorithms are used for modeling graphene oxide and the performances of the two are compared. Two objective functions are used for different boundary values. Root mean square (RMS) deviation is determined and compared.
Place-classification analysis of community vulnerability to near-field tsunami threats in the U.S. Pacific Northwest (Invited)

NASA Astrophysics Data System (ADS)

Wood, N. J.; Jones, J.; Spielman, S.

2013-12-01

Near-field tsunami hazards are credible threats to many coastal communities throughout the world. Along the U.S. Pacific Northwest coast, low-lying areas could be inundated by a series of catastrophic tsunami waves that begin to arrive in a matter of minutes following a Cascadia subduction zone (CSZ) earthquake. This presentation summarizes analytical efforts to classify communities with similar characteristics of community vulnerability to tsunami hazards. This work builds on past State-focused inventories of community exposure to CSZ-related tsunami hazards in northern California, Oregon, and Washington. Attributes used in the classification, or cluster analysis, include demography of residents, spatial extent of the developed footprint based on mid-resolution land cover data, distribution of the local workforce, and the number and type of public venues, dependent-care facilities, and community-support businesses. Population distributions also are characterized by a function of travel time to safety, based on anisotropic, path-distance, geospatial modeling. We used an unsupervised-model-based clustering algorithm and a v-fold, cross-validation procedure (v=50) to identify the appropriate number of community types. We selected class solutions that provided the appropriate balance between parsimony and model fit. The goal of the vulnerability classification is to provide emergency managers with a general sense of the types of communities in tsunami hazard zones based on similar characteristics instead of only providing an exhaustive list of attributes for individual communities. This classification scheme can be then used to target and prioritize risk-reduction efforts that address common issues across multiple communities. The presentation will include a discussion of the utility of proposed place classifications to support regional preparedness and outreach efforts.
Development of Methods for Cross-Sectional HIV Incidence Estimation in a Large, Community Randomized Trial

PubMed Central

Donnell, Deborah; Komárek, Arnošt; Omelka, Marek; Mullis, Caroline E.; Szekeres, Greg; Piwowar-Manning, Estelle; Fiamma, Agnes; Gray, Ronald H.; Lutalo, Tom; Morrison, Charles S.; Salata, Robert A.; Chipato, Tsungai; Celum, Connie; Kahle, Erin M.; Taha, Taha E.; Kumwenda, Newton I.; Karim, Quarraisha Abdool; Naranbhai, Vivek; Lingappa, Jairam R.; Sweat, Michael D.; Coates, Thomas; Eshleman, Susan H.

2013-01-01

Background Accurate methods of HIV incidence determination are critically needed to monitor the epidemic and determine the population level impact of prevention trials. One such trial, Project Accept, a Phase III, community-randomized trial, evaluated the impact of enhanced, community-based voluntary counseling and testing on population-level HIV incidence. The primary endpoint of the trial was based on a single, cross-sectional, post-intervention HIV incidence assessment. Methods and Findings Test performance of HIV incidence determination was evaluated for 403 multi-assay algorithms [MAAs] that included the BED capture immunoassay [BED-CEIA] alone, an avidity assay alone, and combinations of these assays at different cutoff values with and without CD4 and viral load testing on samples from seven African cohorts (5,325 samples from 3,436 individuals with known duration of HIV infection [1 month to >10 years]). The mean window period (average time individuals appear positive for a given algorithm) and performance in estimating an incidence estimate (in terms of bias and variance) of these MAAs were evaluated in three simulated epidemic scenarios (stable, emerging and waning). The power of different test methods to detect a 35% reduction in incidence in the matched communities of Project Accept was also assessed. A MAA was identified that included BED-CEIA, the avidity assay, CD4 cell count, and viral load that had a window period of 259 days, accurately estimated HIV incidence in all three epidemic settings and provided sufficient power to detect an intervention effect in Project Accept. Conclusions In a Southern African setting, HIV incidence estimates and intervention effects can be accurately estimated from cross-sectional surveys using a MAA. The improved accuracy in cross-sectional incidence testing that a MAA provides is a powerful tool for HIV surveillance and program evaluation. PMID:24236054
Development and validation of an algorithm for identifying urinary retention in a cohort of patients with epilepsy in a large US administrative claims database.

PubMed

Quinlan, Scott C; Cheng, Wendy Y; Ishihara, Lianna; Irizarry, Michael C; Holick, Crystal N; Duh, Mei Sheng

2016-04-01

The aim of this study was to develop and validate an insurance claims-based algorithm for identifying urinary retention (UR) in epilepsy patients receiving antiepileptic drugs to facilitate safety monitoring. Data from the HealthCore Integrated Research Database(SM) in 2008-2011 (retrospective) and 2012-2013 (prospective) were used to identify epilepsy patients with UR. During the retrospective phase, three algorithms identified potential UR: (i) UR diagnosis code with a catheterization procedure code; (ii) UR diagnosis code alone; or (iii) diagnosis with UR-related symptoms. Medical records for 50 randomly selected patients satisfying ≥1 algorithm were reviewed by urologists to ascertain UR status. Positive predictive value (PPV) and 95% confidence intervals (CI) were calculated for the three component algorithms and the overall algorithm (defined as satisfying ≥1 component algorithms). Algorithms were refined using urologist review notes. In the prospective phase, the UR algorithm was refined using medical records for an additional 150 cases. In the retrospective phase, the PPV of the overall algorithm was 72.0% (95%CI: 57.5-83.8%). Algorithm 3 performed poorly and was dropped. Algorithm 1 was unchanged; urinary incontinence and cystitis were added as exclusionary diagnoses to Algorithm 2. The PPV for the modified overall algorithm was 89.2% (74.6-97.0%). In the prospective phase, the PPV for the modified overall algorithm was 76.0% (68.4-82.6%). Upon adding overactive bladder, nocturia and urinary frequency as exclusionary diagnoses, the PPV for the final overall algorithm was 81.9% (73.7-88.4%). The current UR algorithm yielded a PPV > 80% and could be used for more accurate identification of UR among epilepsy patients in a large claims database. Copyright © 2016 John Wiley & Sons, Ltd.
Predictors and patterns of problematic Internet game use using a decision tree model

PubMed Central

Rho, Mi Jung; Jeong, Jo-Eun; Chun, Ji-Won; Cho, Hyun; Jung, Dong Jin; Choi, In Young; Kim, Dai-Jin

2016-01-01

Background and aims Problematic Internet game use is an important social issue that increases social expenditures for both individuals and nations. This study identified predictors and patterns of problematic Internet game use. Methods Data were collected from online surveys between November 26 and December 26, 2014. We identified 3,881 Internet game users from a total of 5,003 respondents. A total of 511 participants were assigned to the problematic Internet game user group according to the Diagnostic and Statistical Manual of Mental Disorders Internet gaming disorder criteria. From the remaining 3,370 participants, we used propensity score matching to develop a normal comparison group of 511 participants. In all, 1,022 participants were analyzed using the chi-square automatic interaction detector (CHAID) algorithm. Results According to the CHAID algorithm, six important predictors were found: gaming costs (50%), average weekday gaming time (23%), offline Internet gaming community meeting attendance (13%), average weekend and holiday gaming time (7%), marital status (4%), and self-perceptions of addiction to Internet game use (3%). In addition, three patterns out of six classification rules were explored: cost-consuming, socializing, and solitary gamers. Conclusion This study provides direction for future work on the screening of problematic Internet game use in adults. PMID:27499227
Super-Resolution Imaging Strategies for Cell Biologists Using a Spinning Disk Microscope

PubMed Central

Hosny, Neveen A.; Song, Mingying; Connelly, John T.; Ameer-Beg, Simon; Knight, Martin M.; Wheeler, Ann P.

2013-01-01

In this study we use a spinning disk confocal microscope (SD) to generate super-resolution images of multiple cellular features from any plane in the cell. We obtain super-resolution images by using stochastic intensity fluctuations of biological probes, combining Photoactivation Light-Microscopy (PALM)/Stochastic Optical Reconstruction Microscopy (STORM) methodologies. We compared different image analysis algorithms for processing super-resolution data to identify the most suitable for analysis of particular cell structures. SOFI was chosen for X and Y and was able to achieve a resolution of ca. 80 nm; however higher resolution was possible >30 nm, dependant on the super-resolution image analysis algorithm used. Our method uses low laser power and fluorescent probes which are available either commercially or through the scientific community, and therefore it is gentle enough for biological imaging. Through comparative studies with structured illumination microscopy (SIM) and widefield epifluorescence imaging we identified that our methodology was advantageous for imaging cellular structures which are not immediately at the cell-substrate interface, which include the nuclear architecture and mitochondria. We have shown that it was possible to obtain two coloured images, which highlights the potential this technique has for high-content screening, imaging of multiple epitopes and live cell imaging. PMID:24130668
Predictors and patterns of problematic Internet game use using a decision tree model.

PubMed

Rho, Mi Jung; Jeong, Jo-Eun; Chun, Ji-Won; Cho, Hyun; Jung, Dong Jin; Choi, In Young; Kim, Dai-Jin

2016-09-01

Background and aims Problematic Internet game use is an important social issue that increases social expenditures for both individuals and nations. This study identified predictors and patterns of problematic Internet game use. Methods Data were collected from online surveys between November 26 and December 26, 2014. We identified 3,881 Internet game users from a total of 5,003 respondents. A total of 511 participants were assigned to the problematic Internet game user group according to the Diagnostic and Statistical Manual of Mental Disorders Internet gaming disorder criteria. From the remaining 3,370 participants, we used propensity score matching to develop a normal comparison group of 511 participants. In all, 1,022 participants were analyzed using the chi-square automatic interaction detector (CHAID) algorithm. Results According to the CHAID algorithm, six important predictors were found: gaming costs (50%), average weekday gaming time (23%), offline Internet gaming community meeting attendance (13%), average weekend and holiday gaming time (7%), marital status (4%), and self-perceptions of addiction to Internet game use (3%). In addition, three patterns out of six classification rules were explored: cost-consuming, socializing, and solitary gamers. Conclusion This study provides direction for future work on the screening of problematic Internet game use in adults.
Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

PubMed Central

Zhao, Yu-Xiang; Chou, Chien-Hsing

2016-01-01

In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346
The DataBridge: A System For Optimizing The Use Of Dark Data From The Long Tail Of Science

NASA Astrophysics Data System (ADS)

Lander, H.; Rajasekar, A.

2015-12-01

The DataBridge is a National Science Foundation funded collaborative project (OCI-1247652, OCI-1247602, OCI-1247663) designed to assist in the discovery of dark data sets from the long tail of science. The DataBridge aims to to build queryable communities of datasets using sociometric network analysis. This approach is being tested to evaluate the ability to leverage various forms of metadata to facilitate discovery of new knowledge. Each dataset in the Databridge has an associated name space used as a first level partitioning. In addition to testing known algorithms for SNA community building, the DataBridge project has built a message-based platform that allows users to provide their own algorithms for each of the stages in the community building process. The stages are: Signature Generation (SG): An SG algorithm creates a metadata signature for a dataset. Signature algorithms might use text metadata provided by the dataset creator or derive metadata. Relevance Algorithm (RA): An RA compares a pair of datasets and produces a similarity value between 0 and 1 for the two datasets. Sociometric Network Analysis (SNA): The SNA will operate on a similarity matrix produced by an RA to partition all of the datasets in the name space into a set of clusters. These clusters represent communities of closely related datasets. The DataBridge also includes a web application that produces a visual representation of the clustering. Future work includes a more complete application that will allow different types of searching of the network of datasets. The DataBridge approach is relevant to geoscience research and informatics. In this presentation we will outline the project, illustrate the deployment of the approach, and discuss other potential applications and next steps for the research such as applying this approach to models. In addition we will explore the relevance of DataBridge to other geoscience projects such as various EarthCube Building Blocks and DIBBS projects.
Development of an algorithm to identify fall-related injuries and costs in Medicare data.

PubMed

Kim, Sung-Bou; Zingmond, David S; Keeler, Emmett B; Jennings, Lee A; Wenger, Neil S; Reuben, David B; Ganz, David A

2016-12-01

Identifying fall-related injuries and costs using healthcare claims data is cost-effective and easier to implement than using medical records or patient self-report to track falls. We developed a comprehensive four-step algorithm for identifying episodes of care for fall-related injuries and associated costs, using fee-for-service Medicare and Medicare Advantage health plan claims data for 2,011 patients from 5 medical groups between 2005 and 2009. First, as a preparatory step, we identified care received in acute inpatient and skilled nursing facility settings, in addition to emergency department visits. Second, based on diagnosis and procedure codes, we identified all fall-related claim records. Third, with these records, we identified six types of encounters for fall-related injuries, with different levels of injury and care. In the final step, we used these encounters to identify episodes of care for fall-related injuries. To illustrate the algorithm, we present a representative example of a fall episode and examine descriptive statistics of injuries and costs for such episodes. Altogether, we found that the results support the use of our algorithm for identifying episodes of care for fall-related injuries. When we decomposed an episode, we found that the details present a realistic and coherent story of fall-related injuries and healthcare services. Variation of episode characteristics across medical groups supported the use of a complex algorithm approach, and descriptive statistics on the proportion, duration, and cost of episodes by healthcare services and injuries verified that our results are consistent with other studies. This algorithm can be used to identify and analyze various types of fall-related outcomes including episodes of care, injuries, and associated costs. Furthermore, the algorithm can be applied and adopted in other fall-related studies with relative ease.
The 10/66 Dementia Research Group's fully operationalised DSM-IV dementia computerized diagnostic algorithm, compared with the 10/66 dementia algorithm and a clinician diagnosis: a population validation study

PubMed Central

Prince, Martin J; de Rodriguez, Juan Llibre; Noriega, L; Lopez, A; Acosta, Daisy; Albanese, Emiliano; Arizaga, Raul; Copeland, John RM; Dewey, Michael; Ferri, Cleusa P; Guerra, Mariella; Huang, Yueqin; Jacob, KS; Krishnamoorthy, ES; McKeigue, Paul; Sousa, Renata; Stewart, Robert J; Salas, Aquiles; Sosa, Ana Luisa; Uwakwa, Richard

2008-01-01

Background The criterion for dementia implicit in DSM-IV is widely used in research but not fully operationalised. The 10/66 Dementia Research Group sought to do this using assessments from their one phase dementia diagnostic research interview, and to validate the resulting algorithm in a population-based study in Cuba. Methods The criterion was operationalised as a computerised algorithm, applying clinical principles, based upon the 10/66 cognitive tests, clinical interview and informant reports; the Community Screening Instrument for Dementia, the CERAD 10 word list learning and animal naming tests, the Geriatric Mental State, and the History and Aetiology Schedule – Dementia Diagnosis and Subtype. This was validated in Cuba against a local clinician DSM-IV diagnosis and the 10/66 dementia diagnosis (originally calibrated probabilistically against clinician DSM-IV diagnoses in the 10/66 pilot study). Results The DSM-IV sub-criteria were plausibly distributed among clinically diagnosed dementia cases and controls. The clinician diagnoses agreed better with 10/66 dementia diagnosis than with the more conservative computerized DSM-IV algorithm. The DSM-IV algorithm was particularly likely to miss less severe dementia cases. Those with a 10/66 dementia diagnosis who did not meet the DSM-IV criterion were less cognitively and functionally impaired compared with the DSMIV confirmed cases, but still grossly impaired compared with those free of dementia. Conclusion The DSM-IV criterion, strictly applied, defines a narrow category of unambiguous dementia characterized by marked impairment. It may be specific but incompletely sensitive to clinically relevant cases. The 10/66 dementia diagnosis defines a broader category that may be more sensitive, identifying genuine cases beyond those defined by our DSM-IV algorithm, with relevance to the estimation of the population burden of this disorder. PMID:18577205

Diagnostic accuracy of administrative data algorithms in the diagnosis of osteoarthritis: a systematic review.

PubMed

Shrestha, Swastina; Dave, Amish J; Losina, Elena; Katz, Jeffrey N

2016-07-07

Administrative health care data are frequently used to study disease burden and treatment outcomes in many conditions including osteoarthritis (OA). OA is a chronic condition with significant disease burden affecting over 27 million adults in the US. There are few studies examining the performance of administrative data algorithms to diagnose OA. The purpose of this study is to perform a systematic review of administrative data algorithms for OA diagnosis; and, to evaluate the diagnostic characteristics of algorithms based on restrictiveness and reference standards. Two reviewers independently screened English-language articles published in Medline, Embase, PubMed, and Cochrane databases that used administrative data to identify OA cases. Each algorithm was classified as restrictive or less restrictive based on number and type of administrative codes required to satisfy the case definition. We recorded sensitivity and specificity of algorithms and calculated positive likelihood ratio (LR+) and positive predictive value (PPV) based on assumed OA prevalence of 0.1, 0.25, and 0.50. The search identified 7 studies that used 13 algorithms. Of these 13 algorithms, 5 were classified as restrictive and 8 as less restrictive. Restrictive algorithms had lower median sensitivity and higher median specificity compared to less restrictive algorithms when reference standards were self-report and American college of Rheumatology (ACR) criteria. The algorithms compared to reference standard of physician diagnosis had higher sensitivity and specificity than those compared to self-reported diagnosis or ACR criteria. Restrictive algorithms are more specific for OA diagnosis and can be used to identify cases when false positives have higher costs e.g. interventional studies. Less restrictive algorithms are more sensitive and suited for studies that attempt to identify all cases e.g. screening programs.
Why should we publish Linked Data?

NASA Astrophysics Data System (ADS)

Blower, Jon; Riechert, Maik; Koubarakis, Manolis; Pace, Nino

2016-04-01

We use the Web every day to access information from all kinds of different sources. But the complexity and diversity of scientific data mean that discovering accessing and interpreting data remains a large challenge to researchers, decision-makers and other users. Different sources of useful information on data, algorithms, instruments and publications are scattered around the Web. How can we link all these things together to help users to better understand and exploit earth science data? How can we combine scientific data with other relevant data sources, when standards for describing and sharing data vary so widely between communities? "Linked Data" is a term that describes a set of standards and "best practices" for sharing data on the Web (http://www.w3.org/standards/semanticweb/data). These principles can be summarised as follows: 1. Create unique and persistent identifiers for the important "things" in a community (e.g. datasets, publications, algorithms, instruments). 2. Allow users to "look up" these identifiers on the web to find out more information about them. 3. Make this information machine-readable in a community-neutral format (such as RDF, Resource Description Framework). 4. Within this information, embed links to other things and concepts and say how these are related. 5. Optionally, provide web service interfaces to allow the user to perform sophisticated queries over this information (using a language such as SPARQL). The promise of Linked Data is that, through these techniques, data will be more discoverable, more comprehensible and more usable by different communities, not just the community that produced the data. As a result, many data providers (particularly public-sector institutions) are now publishing data in this way. However, this area is still in its infancy in terms of real-world applications. Data users need guidance and tools to help them use Linked Data. Data providers need reassurance that the investments they are making in publishing Linked Data will result in tangible user benefits. This presentation will address a number of these issues, using real-world experience gathered from four recent European projects: MELODIES (http://melodiesproject.eu), LEO (http://linkedeodata.eu), CHARMe (http://linkedeodata.eu) and TELEIOS (http://www.earthobservatory.eu). These projects have all applied Linked Data techniques in practical, real-world situations involving the use of diverse data (including earth science data) by both industrial and academic users. Specifically, we will: • Identify a set of practical and valuable uses for Linked Data, focusing on areas where Linked Data fills gaps left by other technologies. These uses include: enabling the discovery of earth science data using mass-market search engines, helping users to understand data and its uses, combining data from multiple sources and enabling the annotation of data by users. • Enumerate some common challenges faced by developers of data-driven services who wish to use Linked Data in their applications. • Describe a new suite of tools for managing, processing and visualising Linked Data in earth science applications (including geospatial Linked Data).
A multifactorial obesity model developed from nationwide public health exposome data and modern computational analyses.

PubMed

Gittner, LisaAnn S; Kilbourne, Barbara J; Vadapalli, Ravi; Khan, Hafiz M K; Langston, Michael A

Obesity is both multifactorial and multimodal, making it difficult to identify, unravel and distinguish causative and contributing factors. The lack of a clear model of aetiology hampers the design and evaluation of interventions to prevent and reduce obesity. Using modern graph-theoretical algorithms, we are able to coalesce and analyse thousands of inter-dependent variables and interpret their putative relationships to obesity. Our modelling is different from traditional approaches; we make no a priori assumptions about the population, and model instead based on the actual characteristics of a population. Paracliques, noise-resistant collections of highly-correlated variables, are differentially distilled from data taken over counties associated with low versus high obesity rates. Factor analysis is then applied and a model is developed. Latent variables concentrated around social deprivation, community infrastructure and climate, and especially heat stress were connected to obesity. Infrastructure, environment and community organisation differed in counties with low versus high obesity rates. Clear connections of community infrastructure with obesity in our results lead us to conclude that community level interventions are critical. This effort suggests that it might be useful to study and plan interventions around community organisation and structure, rather than just the individual, to combat the nation's obesity epidemic. Copyright © 2017 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soner Yorgun, M.; Rood, Richard B.

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE PAGES

Soner Yorgun, M.; Rood, Richard B.

2016-11-11

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
An algorithm to identify functional groups in organic molecules.

PubMed

Ertl, Peter

2017-06-07

The concept of functional groups forms a basis of organic chemistry, medicinal chemistry, toxicity assessment, spectroscopy and also chemical nomenclature. All current software systems to identify functional groups are based on a predefined list of substructures. We are not aware of any program that can identify all functional groups in a molecule automatically. The algorithm presented in this article is an attempt to solve this scientific challenge. An algorithm to identify functional groups in a molecule based on iterative marching through its atoms is described. The procedure is illustrated by extracting functional groups from the bioactive portion of the ChEMBL database, resulting in identification of 3080 unique functional groups. A new algorithm to identify all functional groups in organic molecules is presented. The algorithm is relatively simple and full details with examples are provided, therefore implementation in any cheminformatics toolkit should be relatively easy. The new method allows the analysis of functional groups in large chemical databases in a way that was not possible using previous approaches. Graphical abstract .
A systematic review of validated methods for identifying anaphylaxis, including anaphylactic shock and angioneurotic edema, using administrative and claims data.

PubMed

Schneider, Gary; Kachroo, Sumesh; Jones, Natalie; Crean, Sheila; Rotella, Philip; Avetisyan, Ruzan; Reynolds, Matthew W

2012-01-01

The Food and Drug Administration's Mini-Sentinel pilot program initially aims to conduct active surveillance to refine safety signals that emerge for marketed medical products. A key facet of this surveillance is to develop and understand the validity of algorithms for identifying health outcomes of interest from administrative and claims data. This article summarizes the process and findings of the algorithm review of anaphylaxis. PubMed and Iowa Drug Information Service searches were conducted to identify citations applicable to the anaphylaxis health outcome of interest. Level 1 abstract reviews and Level 2 full-text reviews were conducted to find articles using administrative and claims data to identify anaphylaxis and including validation estimates of the coding algorithms. Our search revealed limited literature focusing on anaphylaxis that provided administrative and claims data-based algorithms and validation estimates. Only four studies identified via literature searches provided validated algorithms; however, two additional studies were identified by Mini-Sentinel collaborators and were incorporated. The International Classification of Diseases, Ninth Revision, codes varied, as did the positive predictive value, depending on the cohort characteristics and the specific codes used to identify anaphylaxis. Research needs to be conducted on designing validation studies to test anaphylaxis algorithms and estimating their predictive power, sensitivity, and specificity. Copyright © 2012 John Wiley & Sons, Ltd.
Multilabel user classification using the community structure of online networks

PubMed Central

Papadopoulos, Symeon; Kompatsiaris, Yiannis

2017-01-01

We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user’s graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score. PMID:28278242
Multilabel user classification using the community structure of online networks.

PubMed

Rizos, Georgios; Papadopoulos, Symeon; Kompatsiaris, Yiannis

2017-01-01

We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score.
Importance of multi-modal approaches to effectively identify cataract cases from electronic health records.

PubMed

Peissig, Peggy L; Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

2012-01-01

There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.
Bouc-Wen hysteresis model identification using Modified Firefly Algorithm

NASA Astrophysics Data System (ADS)

Zaman, Mohammad Asif; Sikder, Urmita

2015-12-01

The parameters of Bouc-Wen hysteresis model are identified using a Modified Firefly Algorithm. The proposed algorithm uses dynamic process control parameters to improve its performance. The algorithm is used to find the model parameter values that results in the least amount of error between a set of given data points and points obtained from the Bouc-Wen model. The performance of the algorithm is compared with the performance of conventional Firefly Algorithm, Genetic Algorithm and Differential Evolution algorithm in terms of convergence rate and accuracy. Compared to the other three optimization algorithms, the proposed algorithm is found to have good convergence rate with high degree of accuracy in identifying Bouc-Wen model parameters. Finally, the proposed method is used to find the Bouc-Wen model parameters from experimental data. The obtained model is found to be in good agreement with measured data.
Improving the Efficiency and Effectiveness of Community Detection via Prior-Induced Equivalent Super-Network.

PubMed

Yang, Liang; Jin, Di; He, Dongxiao; Fu, Huazhu; Cao, Xiaochun; Fogelman-Soulie, Francoise

2017-03-29

Due to the importance of community structure in understanding network and a surge of interest aroused on community detectability, how to improve the community identification performance with pairwise prior information becomes a hot topic. However, most existing semi-supervised community detection algorithms only focus on improving the accuracy but ignore the impacts of priors on speeding detection. Besides, they always require to tune additional parameters and cannot guarantee pairwise constraints. To address these drawbacks, we propose a general, high-speed, effective and parameter-free semi-supervised community detection framework. By constructing the indivisible super-nodes according to the connected subgraph of the must-link constraints and by forming the weighted super-edge based on network topology and cannot-link constraints, our new framework transforms the original network into an equivalent but much smaller Super-Network. Super-Network perfectly ensures the must-link constraints and effectively encodes cannot-link constraints. Furthermore, the time complexity of super-network construction process is linear in the original network size, which makes it efficient. Meanwhile, since the constructed super-network is much smaller than the original one, any existing community detection algorithm is much faster when using our framework. Besides, the overall process will not introduce any additional parameters, making it more practical.
Finding Statistically Significant Communities in Networks

PubMed Central

Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo

2011-01-01

Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480
Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.

PubMed

Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang

2015-01-01

Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.
A Stochastic Model for Detecting Overlapping and Hierarchical Community Structure

PubMed Central

Cao, Xiaochun; Wang, Xiao; Jin, Di; Guo, Xiaojie; Tang, Xianchao

2015-01-01

Community detection is a fundamental problem in the analysis of complex networks. Recently, many researchers have concentrated on the detection of overlapping communities, where a vertex may belong to more than one community. However, most current methods require the number (or the size) of the communities as a priori information, which is usually unavailable in real-world networks. Thus, a practical algorithm should not only find the overlapping community structure, but also automatically determine the number of communities. Furthermore, it is preferable if this method is able to reveal the hierarchical structure of networks as well. In this work, we firstly propose a generative model that employs a nonnegative matrix factorization (NMF) formulization with a l2,1 norm regularization term, balanced by a resolution parameter. The NMF has the nature that provides overlapping community structure by assigning soft membership variables to each vertex; the l2,1 regularization term is a technique of group sparsity which can automatically determine the number of communities by penalizing too many nonempty communities; and hence the resolution parameter enables us to explore the hierarchical structure of networks. Thereafter, we derive the multiplicative update rule to learn the model parameters, and offer the proof of its correctness. Finally, we test our approach on a variety of synthetic and real-world networks, and compare it with some state-of-the-art algorithms. The results validate the superior performance of our new method. PMID:25822148
Self-reported transient ischemic attack and stroke symptoms: methods and baseline prevalence. The ARIC Study, 1987-1989.

PubMed

Toole, J F; Lefkowitz, D S; Chambless, L E; Wijnberg, L; Paton, C C; Heiss, G

1996-11-01

As part of the Atherosclerosis Risk in Communities (ARIC) Study assessment of the etiology and sequelae of atherosclerosis, a standardized questionnaire on transient ischemic attack (TIA) and nonfatal stroke and a computerized diagnostic algorithm simulating clinical reasoning were developed and tested at the four ARIC field centers: Forsyth County, North Carolina; Minneapolis, Minnesota; Jackson, Mississippi; and Washington County, Maryland. The diagnostic algorithm used participant responses to a series of questions about six neurologic trigger symptoms to identify symptoms of TIA or stroke and their vascular distribution. Among 12,205 ARIO participants reporting their lifetime occurrence of one or more symptoms probably due to cerebrovascular causes, nearly half (47%) reported the sudden onset of at least one symptom sometime prior to their ARIC examination. Of those with at least one symptom, only 12.9% were classified by the computer algorithm as having symptoms of TIA or stroke. Dizziness/loss of balance was the most frequently reported symptom (36%); 1.2% of these persons were classified by the algorithm as having a TIA/stroke event. Positive symptoms of speech dysfunction were classified most often (77.%) as being symptoms of TIA or stroke. Symptoms suggesting TIA were reported more frequently than symptoms suggesting stroke by both sexes. TIA or stroke-like phenomena were more frequent (p < 0.001) in females (7%) than in males (5%) and increased with age in both sexes (p = 0.13 for females; p = 0.02 for males). In Forsyth County, TIA and stroke symptoms were greater in African Americans than in Caucasians (p = 0.05, controlling for sex). The association of algorithmically defined symptoms of TIA or stroke with traditional cerebrovascular risk factors is the subject of a companion paper.
Management of hypertension at the community level in sub-Saharan Africa (SSA): towards a rational use of available resources.

PubMed

Twagirumukiza, M; Van Bortel, L M

2011-01-01

Hypertension is emerging in many developing nations as a leading cause of cardiovascular mortality, morbidity and disability in adults. In sub-Saharan African (SSA) countries it has specificities such as occurring in young and active adults, resulting in severe complications dominated by heart failure and taking place in limited-resource settings in which an individual's access to treatment (affordability) is very limited. Within this context of restrained economic conditions, the greatest gains for SSA in controlling the hypertension epidemic lie in its prevention. Attempts should be made to detect hypertensive patients early before irreversible organ damage becomes apparent, and to provide them with the best possible and affordable non-pharmacological and pharmacological treatment. Therefore, efforts should be made for detection and early management at the community level. In this context, a standardized algorithm of management can help in the rational use of available resources. Although many international and regional guidelines have been published, they cannot apply to SSA settings because the economy of the countries and affordability of the patients do not allow access to advocated treatment. In addition, none of them suggest a clear algorithm of management for limited-resource settings at the community level. In line with available data and analysing existing guidelines, a practical algorithm for management of hypertension at the community level, including treatment affordability, has been suggested in the present work.
Community detection in complex networks using deep auto-encoded extreme learning machine

NASA Astrophysics Data System (ADS)

Wang, Feifan; Zhang, Baihai; Chai, Senchun; Xia, Yuanqing

2018-06-01

Community detection has long been a fascinating topic in complex networks since the community structure usually unveils valuable information of interest. The prevalence and evolution of deep learning and neural networks have been pushing forward the advancement in various research fields and also provide us numerous useful and off the shelf techniques. In this paper, we put the cascaded stacked autoencoders and the unsupervised extreme learning machine (ELM) together in a two-level embedding process and propose a novel community detection algorithm. Extensive comparison experiments in circumstances of both synthetic and real-world networks manifest the advantages of the proposed algorithm. On one hand, it outperforms the k-means clustering in terms of the accuracy and stability thus benefiting from the determinate dimensions of the ELM block and the integration of sparsity restrictions. On the other hand, it endures smaller complexity than the spectral clustering method on account of the shrinkage in time spent on the eigenvalue decomposition procedure.
Evaluation of the performance of existing non-laboratory based cardiovascular risk assessment algorithms

PubMed Central

2013-01-01

Background The high burden and rising incidence of cardiovascular disease (CVD) in resource constrained countries necessitates implementation of robust and pragmatic primary and secondary prevention strategies. Many current CVD management guidelines recommend absolute cardiovascular (CV) risk assessment as a clinically sound guide to preventive and treatment strategies. Development of non-laboratory based cardiovascular risk assessment algorithms enable absolute risk assessment in resource constrained countries. The objective of this review is to evaluate the performance of existing non-laboratory based CV risk assessment algorithms using the benchmarks for clinically useful CV risk assessment algorithms outlined by Cooney and colleagues. Methods A literature search to identify non-laboratory based risk prediction algorithms was performed in MEDLINE, CINAHL, Ovid Premier Nursing Journals Plus, and PubMed databases. The identified algorithms were evaluated using the benchmarks for clinically useful cardiovascular risk assessment algorithms outlined by Cooney and colleagues. Results Five non-laboratory based CV risk assessment algorithms were identified. The Gaziano and Framingham algorithms met the criteria for appropriateness of statistical methods used to derive the algorithms and endpoints. The Swedish Consultation, Framingham and Gaziano algorithms demonstrated good discrimination in derivation datasets. Only the Gaziano algorithm was externally validated where it had optimal discrimination. The Gaziano and WHO algorithms had chart formats which made them simple and user friendly for clinical application. Conclusion Both the Gaziano and Framingham non-laboratory based algorithms met most of the criteria outlined by Cooney and colleagues. External validation of the algorithms in diverse samples is needed to ascertain their performance and applicability to different populations and to enhance clinicians’ confidence in them. PMID:24373202
Is Self-organization a Rational Expectation?

NASA Astrophysics Data System (ADS)

Luediger, Heinz

Over decades and under varying names the study of biology-inspired algorithms applied to non-living systems has been the subject of a small and somewhat exotic research community. Only the recent coincidence of a growing inability to master the design, development and operation of increasingly intertwined systems and processes, and an accelerated trend towards a naïve if not romanticizing view of nature in the sciences, has led to the adoption of biology-inspired algorithmic research by a wider range of sciences. Adaptive systems, as we apparently observe in nature, are meanwhile viewed as a promising way out of the complexity trap and, propelled by a long list of ‘self’ catchwords, complexity research has become an influential stream in the science community. This paper presents four provocative theses that cast doubt on the strategic potential of complexity research and the viability of large scale deployment of biology-inspired algorithms in an expectation driven world.

Spatial and Functional Organization of Pig Trade in Different European Production Systems: Implications for Disease Prevention and Control.

PubMed

Relun, Anne; Grosbois, Vladimir; Sánchez-Vizcaíno, José Manuel; Alexandrov, Tsviatko; Feliziani, Francesco; Waret-Szkuta, Agnès; Molia, Sophie; Etter, Eric Marcel Charles; Martínez-López, Beatriz

2016-01-01

Understanding the complexity of live pig trade organization is a key factor to predict and control major infectious diseases, such as classical swine fever (CSF) or African swine fever (ASF). Whereas the organization of pig trade has been described in several European countries with indoor commercial production systems, little information is available on this organization in other systems, such as outdoor or small-scale systems. The objective of this study was to describe and compare the spatial and functional organization of live pig trade in different European countries and different production systems. Data on premise characteristics and pig movements between premises were collected during 2011 from Bulgaria, France, Italy, and Spain, which swine industry is representative of most of the production systems in Europe (i.e., commercial vs. small-scale and outdoor vs. indoor). Trade communities were identified in each country using the Walktrap algorithm. Several descriptive and network metrics were generated at country and community levels. Pig trade organization showed heterogeneous spatial and functional organization. Trade communities mostly composed of indoor commercial premises were identified in western France, northern Italy, northern Spain, and north-western Bulgaria. They covered large distances, overlapped in space, demonstrated both scale-free and small-world properties, with a role of trade operators and multipliers as key premises. Trade communities involving outdoor commercial premises were identified in western Spain, south-western and central France. They were more spatially clustered, demonstrated scale-free properties, with multipliers as key premises. Small-scale communities involved the majority of premises in Bulgaria and in central and Southern Italy. They were spatially clustered and had scale-free properties, with key premises usually being commercial production premises. These results indicate that a disease might spread very differently according to the production system and that key premises could be targeted to more cost-effectively control diseases. This study provides useful epidemiological information and parameters that could be used to design risk-based surveillance strategies or to more accurately model the risk of introduction or spread of devastating swine diseases, such as ASF, CSF, or foot-and-mouth disease.
Spatial and Functional Organization of Pig Trade in Different European Production Systems: Implications for Disease Prevention and Control

PubMed Central

Relun, Anne; Grosbois, Vladimir; Sánchez-Vizcaíno, José Manuel; Alexandrov, Tsviatko; Feliziani, Francesco; Waret-Szkuta, Agnès; Molia, Sophie; Etter, Eric Marcel Charles; Martínez-López, Beatriz

2016-01-01

Understanding the complexity of live pig trade organization is a key factor to predict and control major infectious diseases, such as classical swine fever (CSF) or African swine fever (ASF). Whereas the organization of pig trade has been described in several European countries with indoor commercial production systems, little information is available on this organization in other systems, such as outdoor or small-scale systems. The objective of this study was to describe and compare the spatial and functional organization of live pig trade in different European countries and different production systems. Data on premise characteristics and pig movements between premises were collected during 2011 from Bulgaria, France, Italy, and Spain, which swine industry is representative of most of the production systems in Europe (i.e., commercial vs. small-scale and outdoor vs. indoor). Trade communities were identified in each country using the Walktrap algorithm. Several descriptive and network metrics were generated at country and community levels. Pig trade organization showed heterogeneous spatial and functional organization. Trade communities mostly composed of indoor commercial premises were identified in western France, northern Italy, northern Spain, and north-western Bulgaria. They covered large distances, overlapped in space, demonstrated both scale-free and small-world properties, with a role of trade operators and multipliers as key premises. Trade communities involving outdoor commercial premises were identified in western Spain, south-western and central France. They were more spatially clustered, demonstrated scale-free properties, with multipliers as key premises. Small-scale communities involved the majority of premises in Bulgaria and in central and Southern Italy. They were spatially clustered and had scale-free properties, with key premises usually being commercial production premises. These results indicate that a disease might spread very differently according to the production system and that key premises could be targeted to more cost-effectively control diseases. This study provides useful epidemiological information and parameters that could be used to design risk-based surveillance strategies or to more accurately model the risk of introduction or spread of devastating swine diseases, such as ASF, CSF, or foot-and-mouth disease. PMID:26870738
Yeast species diversity in apple juice for cider production evidenced by culture-based method.

PubMed

Lorenzini, Marilinda; Simonato, Barbara; Zapparoli, Giacomo

2018-05-07

Identification of yeasts isolated from apple juices of two cider houses (one located in a plain area and one in an alpine area) was carried out by culture-based method. Wallerstein Laboratory Nutrient Agar was used as medium for isolation and preliminary yeasts identification. A total of 20 species of yeasts belonging to ten different genera were identified using both BLAST algorithm for pairwise sequence comparison and phylogenetic approaches. A wide variety of non-Saccharomyces species was found. Interestingly, Candida railenensis, Candida cylindracea, Hanseniaspora meyeri, Hanseniaspora pseudoguilliermondii, and Metschnikowia sinensis were recovered for the first time in the yeast community of an apple environment. Phylogenetic analysis revealed a better resolution in identifying Metschnikowia and Moesziomyces isolates than comparative analysis using the GenBank or YeastIP gene databases. This study provides important data on yeast microbiota of apple juice and evidenced differences between two geographical cider production areas in terms of species composition.
Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation

NASA Technical Reports Server (NTRS)

Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.

2016-01-01

A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.
Rutgers University Subcontract B611610 Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soundarajan, Sucheta; Eliassi-Rad, Tina; Gallagher, Brian

Given an incomplete (i.e., partially-observed) network, which nodes should we actively probe in order to achieve the highest accuracy for a given network feature? For example, consider a cyber-network administrator who observes only a portion of the network at time t and wants to accurately identify the most important (e.g., highest PageRank) nodes in the complete network. She has a limited budget for probing the network. Of all the nodes she has observed, which should she probe in order to most accurately identify the important nodes? We propose a novel and scalable algorithm, MaxOutProbe, and evaluate it w.r.t. four networkmore » features (largest connected component, PageRank, core-periphery, and community detection), five network sampling strategies, and seven network datasets from different domains. Across a range of conditions, MaxOutProbe demonstrates consistently high performance relative to several baseline strategies« less
Microaneurysm detection with radon transform-based classification on retina images.

PubMed

Giancardo, L; Meriaudeau, F; Karnowski, T P; Li, Y; Tobin, K W; Chaum, E

2011-01-01

The creation of an automatic diabetic retinopathy screening system using retina cameras is currently receiving considerable interest in the medical imaging community. The detection of microaneurysms is a key element in this effort. In this work, we propose a new microaneurysms segmentation technique based on a novel application of the radon transform, which is able to identify these lesions without any previous knowledge of the retina morphological features and with minimal image preprocessing. The algorithm has been evaluated on the Retinopathy Online Challenge public dataset, and its performance compares with the best current techniques. The performance is particularly good at low false positive ratios, which makes it an ideal candidate for diabetic retinopathy screening systems.
Improved particle swarm optimization algorithm for android medical care IOT using modified parameters.

PubMed

Sung, Wen-Tsai; Chiang, Yen-Chun

2012-12-01

This study examines wireless sensor network with real-time remote identification using the Android study of things (HCIOT) platform in community healthcare. An improved particle swarm optimization (PSO) method is proposed to efficiently enhance physiological multi-sensors data fusion measurement precision in the Internet of Things (IOT) system. Improved PSO (IPSO) includes: inertia weight factor design, shrinkage factor adjustment to allow improved PSO algorithm data fusion performance. The Android platform is employed to build multi-physiological signal processing and timely medical care of things analysis. Wireless sensor network signal transmission and Internet links allow community or family members to have timely medical care network services.
Whole earth modeling: developing and disseminating scientific software for computational geophysics.

NASA Astrophysics Data System (ADS)

Kellogg, L. H.

2016-12-01

Historically, a great deal of specialized scientific software for modeling and data analysis has been developed by individual researchers or small groups of scientists working on their own specific research problems. As the magnitude of available data and computer power has increased, so has the complexity of scientific problems addressed by computational methods, creating both a need to sustain existing scientific software, and expand its development to take advantage of new algorithms, new software approaches, and new computational hardware. To that end, communities like the Computational Infrastructure for Geodynamics (CIG) have been established to support the use of best practices in scientific computing for solid earth geophysics research and teaching. Working as a scientific community enables computational geophysicists to take advantage of technological developments, improve the accuracy and performance of software, build on prior software development, and collaborate more readily. The CIG community, and others, have adopted an open-source development model, in which code is developed and disseminated by the community in an open fashion, using version control and software repositories like Git. One emerging issue is how to adequately identify and credit the intellectual contributions involved in creating open source scientific software. The traditional method of disseminating scientific ideas, peer reviewed publication, was not designed for review or crediting scientific software, although emerging publication strategies such software journals are attempting to address the need. We are piloting an integrated approach in which authors are identified and credited as scientific software is developed and run. Successful software citation requires integration with the scholarly publication and indexing mechanisms as well, to assign credit, ensure discoverability, and provide provenance for software.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

PubMed Central

Wernisch, Lorenz

2017-01-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

PubMed

Gabasova, Evelina; Reid, John; Wernisch, Lorenz

2017-10-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
The Barcode of Life Data Portal: Bridging the Biodiversity Informatics Divide for DNA Barcoding

PubMed Central

Sarkar, Indra Neil; Trizna, Michael

2011-01-01

With the volume of molecular sequence data that is systematically being generated globally, there is a need for centralized resources for data exploration and analytics. DNA Barcode initiatives are on track to generate a compendium of molecular sequence–based signatures for identifying animals and plants. To date, the range of available data exploration and analytic tools to explore these data have only been available in a boutique form—often representing a frustrating hurdle for many researchers that may not necessarily have resources to install or implement algorithms described by the analytic community. The Barcode of Life Data Portal (BDP) is a first step towards integrating the latest biodiversity informatics innovations with molecular sequence data from DNA barcoding. Through establishment of community driven standards, based on discussion with the Data Analysis Working Group (DAWG) of the Consortium for the Barcode of Life (CBOL), the BDP provides an infrastructure for incorporation of existing and next-generation DNA barcode analytic applications in an open forum. PMID:21818249
Collective credit allocation in science

PubMed Central

Shen, Hua-Wei; Barabási, Albert-László

2014-01-01

Collaboration among researchers is an essential component of the modern scientific enterprise, playing a particularly important role in multidisciplinary research. However, we continue to wrestle with allocating credit to the coauthors of publications with multiple authors, because the relative contribution of each author is difficult to determine. At the same time, the scientific community runs an informal field-dependent credit allocation process that assigns credit in a collective fashion to each work. Here we develop a credit allocation algorithm that captures the coauthors’ contribution to a publication as perceived by the scientific community, reproducing the informal collective credit allocation of science. We validate the method by identifying the authors of Nobel-winning papers that are credited for the discovery, independent of their positions in the author list. The method can also compare the relative impact of researchers working in the same field, even if they did not publish together. The ability to accurately measure the relative credit of researchers could affect many aspects of credit allocation in science, potentially impacting hiring, funding, and promotion decisions. PMID:25114238
Demonstrating the suitability of genetic algorithms for driving microbial ecosystems in desirable directions.

PubMed

Vandecasteele, Frederik P J; Hess, Thomas F; Crawford, Ronald L

2007-07-01

The functioning of natural microbial ecosystems is determined by biotic interactions, which are in turn influenced by abiotic environmental conditions. Direct experimental manipulation of such conditions can be used to purposefully drive ecosystems toward exhibiting desirable functions. When a set of environmental conditions can be manipulated to be present at a discrete number of levels, finding the right combination of conditions to obtain the optimal desired effect becomes a typical combinatorial optimisation problem. Genetic algorithms are a class of robust and flexible search and optimisation techniques from the field of computer science that may be very suitable for such a task. To verify this idea, datasets containing growth levels of the total microbial community of four different natural microbial ecosystems in response to all possible combinations of a set of five chemical supplements were obtained. Subsequently, the ability of a genetic algorithm to search this parameter space for combinations of supplements driving the microbial communities to high levels of growth was compared to that of a random search, a local search, and a hill-climbing algorithm, three intuitive alternative optimisation approaches. The results indicate that a genetic algorithm is very suitable for driving microbial ecosystems in desirable directions, which opens opportunities for both fundamental ecological research and industrial applications.
School-Based Screening for Suicide Risk: Balancing Costs and Benefits

PubMed Central

Wilcox, Holly; Huo, Yanling; Turner, J. Blake; Fisher, Prudence; Shaffer, David

2010-01-01

Objectives. We examined the effects of a scoring algorithm change on the burden and sensitivity of a screen for adolescent suicide risk. Methods. The Columbia Suicide Screen was used to screen 641 high school students for high suicide risk (recent ideation or lifetime attempt and depression, or anxiety, or substance use), determined by subsequent blind assessment with the Diagnostic Interview Schedule for Children. We compared the accuracy of different screen algorithms in identifying high-risk cases. Results. A screen algorithm comprising recent ideation or lifetime attempt or depression, anxiety, or substance-use problems set at moderate-severity level classed 35% of students as positive and identified 96% of high-risk students. Increasing the algorithm's threshold reduced the proportion identified to 24% and identified 92% of high-risk cases. Asking only about recent suicidal ideation or lifetime suicide attempt identified 17% of the students and 89% of high-risk cases. The proportion of nonsuicidal diagnosis–bearing students found with the 3 algorithms was 62%, 34%, and 12%, respectively. Conclusions. The Columbia Suicide Screen threshold can be altered to reduce the screen-positive population, saving costs and time while identifying almost all students at high risk for suicide. PMID:20634467
Validation of classification algorithms for childhood diabetes identified from administrative data.

PubMed

Vanderloo, Saskia E; Johnson, Jeffrey A; Reimer, Kim; McCrea, Patrick; Nuernberger, Kimberly; Krueger, Hans; Aydede, Sema K; Collet, Jean-Paul; Amed, Shazhan

2012-05-01

Type 1 diabetes is the most common form of diabetes among children; however, the proportion of cases of childhood type 2 diabetes is increasing. In Canada, the National Diabetes Surveillance System (NDSS) uses administrative health data to describe trends in the epidemiology of diabetes, but does not specify diabetes type. The objective of this study was to validate algorithms to classify diabetes type in children <20 yr identified using the NDSS methodology. We applied the NDSS case definition to children living in British Columbia between 1 April 1996 and 31 March 2007. Through an iterative process, four potential classification algorithms were developed based on demographic characteristics and drug-utilization patterns. Each algorithm was then validated against a gold standard clinical database. Algorithms based primarily on an age rule (i.e., age <10 at diagnosis categorized type 1 diabetes) were most sensitive in the identification of type 1 diabetes; algorithms with restrictions on drug utilization (i.e., no prescriptions for insulin ± glucose monitoring strips categorized type 2 diabetes) were most sensitive for identifying type 2 diabetes. One algorithm was identified as having the optimal balance of sensitivity (Sn) and specificity (Sp) for the identification of both type 1 (Sn: 98.6%; Sp: 78.2%; PPV: 97.8%) and type 2 diabetes (Sn: 83.2%; Sp: 97.5%; PPV: 73.7%). Demographic characteristics in combination with drug-utilization patterns can be used to differentiate diabetes type among cases of pediatric diabetes identified within administrative health databases. Validation of similar algorithms in other regions is warranted. © 2011 John Wiley & Sons A/S.
A systematic review of validated methods to capture myopericarditis using administrative or claims data.

PubMed

Idowu, Rachel T; Carnahan, Ryan; Sathe, Nila A; McPheeters, Melissa L

2013-12-30

To identify algorithms that can capture incident cases of myocarditis and pericarditis in administrative and claims databases; these algorithms can eventually be used to identify cardiac inflammatory adverse events following vaccine administration. We searched MEDLINE from 1991 to September 2012 using controlled vocabulary and key terms related to myocarditis. We also searched the reference lists of included studies. Two investigators independently assessed the full text of studies against pre-determined inclusion criteria. Two reviewers independently extracted data regarding participant and algorithm characteristics as well as study conduct. Nine publications (including one study reported in two publications) met criteria for inclusion. Two studies performed medical record review in order to confirm that these coding algorithms actually captured patients with the disease of interest. One of these studies identified five potential cases, none of which were confirmed as acute myocarditis upon review. The other study, which employed a search algorithm based on diagnostic surveillance (using ICD-9 codes 420.90, 420.99, 422.90, 422.91 and 429.0) and sentinel reporting, identified 59 clinically confirmed cases of myopericarditis among 492,671 United States military service personnel who received smallpox vaccine between 2002 and 2003. Neither study provided algorithm validation statistics (positive predictive value, sensitivity, or specificity). A validated search algorithm is currently unavailable for identifying incident cases of pericarditis or myocarditis. Several authors have published unvalidated ICD-9-based search algorithms that appear to capture myocarditis events occurring in the context of other underlying cardiac or autoimmune conditions. Copyright © 2013. Published by Elsevier Ltd.
Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

PubMed Central

Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

2012-01-01

Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries. PMID:22319176
Comparing the Healthy Nose and Nasopharynx Microbiota Reveals Continuity As Well As Niche-Specificity

PubMed Central

De Boeck, Ilke; Wittouck, Stijn; Wuyts, Sander; Oerlemans, Eline F. M.; van den Broek, Marianne F. L.; Vandenheuvel, Dieter; Vanderveken, Olivier; Lebeer, Sarah

2017-01-01

To improve our understanding of upper respiratory tract (URT) diseases and the underlying microbial pathogenesis, a better characterization of the healthy URT microbiome is crucial. In this first large-scale study, we obtained more insight in the URT microbiome of healthy adults. Hereto, we collected paired nasal and nasopharyngeal swabs from 100 healthy participants in a citizen-science project. High-throughput 16S rRNA gene V4 amplicon sequencing was performed and samples were processed using the Divisive Amplicon Denoising Algorithm 2 (DADA2) algorithm. This allowed us to identify the bacterial richness and diversity of the samples in terms of amplicon sequence variants (ASVs), with special attention to intragenus variation. We found both niches to have a low overall species richness and uneven distribution. Moreover, based on hierarchical clustering, nasopharyngeal samples could be grouped into some bacterial community types at genus level, of which four were supported to some extent by prediction strength evaluation: one intermixed type with a higher bacterial diversity where Staphylococcus, Corynebacterium, and Dolosigranulum appeared main bacterial members in different relative abundances, and three types dominated by either Moraxella, Streptococcus, or Fusobacterium. Some of these bacterial community types such as Streptococcus and Fusobacterium were nasopharynx-specific and never occurred in the nose. No clear association between the nasopharyngeal bacterial profiles at genus level and the variables age, gender, blood type, season of sampling, or common respiratory allergies was found in this study population, except for smoking showing a positive association with Corynebacterium and Staphylococcus. Based on the fine-scale resolution of the ASVs, both known commensal and potential pathogenic bacteria were found within several genera – particularly in Streptococcus and Moraxella – in our healthy study population. Of interest, the nasopharynx hosted more potential pathogenic species than the nose. To our knowledge, this is the first large-scale study using the DADA2 algorithm to investigate the microbiota in the “healthy” adult nose and nasopharynx. These results contribute to a better understanding of the composition and diversity of the healthy microbiome in the URT and the differences between these important URT niches. Trial Registration: Ethical Committee of Antwerp University Hospital, B300201524257, registered 23 March 2015, ClinicalTrials.gov Identifier: NCT02 933983. PMID:29238339
A physarum-inspired prize-collecting steiner tree approach to identify subnetworks for drug repositioning.

PubMed

Sun, Yahui; Hameed, Pathima Nusrath; Verspoor, Karin; Halgamuge, Saman

2016-12-05

Drug repositioning can reduce the time, costs and risks of drug development by identifying new therapeutic effects for known drugs. It is challenging to reposition drugs as pharmacological data is large and complex. Subnetwork identification has already been used to simplify the visualization and interpretation of biological data, but it has not been applied to drug repositioning so far. In this paper, we fill this gap by proposing a new Physarum-inspired Prize-Collecting Steiner Tree algorithm to identify subnetworks for drug repositioning. Drug Similarity Networks (DSN) are generated using the chemical, therapeutic, protein, and phenotype features of drugs. In DSNs, vertex prizes and edge costs represent the similarities and dissimilarities between drugs respectively, and terminals represent drugs in the cardiovascular class, as defined in the Anatomical Therapeutic Chemical classification system. A new Physarum-inspired Prize-Collecting Steiner Tree algorithm is proposed in this paper to identify subnetworks. We apply both the proposed algorithm and the widely-used GW algorithm to identify subnetworks in our 18 generated DSNs. In these DSNs, our proposed algorithm identifies subnetworks with an average Rand Index of 81.1%, while the GW algorithm can only identify subnetworks with an average Rand Index of 64.1%. We select 9 subnetworks with high Rand Index to find drug repositioning opportunities. 10 frequently occurring drugs in these subnetworks are identified as candidates to be repositioned for cardiovascular diseases. We find evidence to support previous discoveries that nitroglycerin, theophylline and acarbose may be able to be repositioned for cardiovascular diseases. Moreover, we identify seven previously unknown drug candidates that also may interact with the biological cardiovascular system. These discoveries show our proposed Prize-Collecting Steiner Tree approach as a promising strategy for drug repositioning.
Mapping-by-sequencing in complex polyploid genomes using genic sequence capture: a case study to map yellow rust resistance in hexaploid wheat.

PubMed

Gardiner, Laura-Jayne; Bansept-Basler, Pauline; Olohan, Lisa; Joynson, Ryan; Brenchley, Rachel; Hall, Neil; O'Sullivan, Donal M; Hall, Anthony

2016-08-01

Previously we extended the utility of mapping-by-sequencing by combining it with sequence capture and mapping sequence data to pseudo-chromosomes that were organized using wheat-Brachypodium synteny. This, with a bespoke haplotyping algorithm, enabled us to map the flowering time locus in the diploid wheat Triticum monococcum L. identifying a set of deleted genes (Gardiner et al., 2014). Here, we develop this combination of gene enrichment and sliding window mapping-by-synteny analysis to map the Yr6 locus for yellow stripe rust resistance in hexaploid wheat. A 110 MB NimbleGen capture probe set was used to enrich and sequence a doubled haploid mapping population of hexaploid wheat derived from an Avalon and Cadenza cross. The Yr6 locus was identified by mapping to the POPSEQ chromosomal pseudomolecules using a bespoke pipeline and algorithm (Chapman et al., 2015). Furthermore the same locus was identified using newly developed pseudo-chromosome sequences as a mapping reference that are based on the genic sequence used for sequence enrichment. The pseudo-chromosomes allow us to demonstrate the application of mapping-by-sequencing to even poorly defined polyploidy genomes where chromosomes are incomplete and sub-genome assemblies are collapsed. This analysis uniquely enabled us to: compare wheat genome annotations; identify the Yr6 locus - defining a smaller genic region than was previously possible; associate the interval with one wheat sub-genome and increase the density of SNP markers associated. Finally, we built the pipeline in iPlant, making it a user-friendly community resource for phenotype mapping. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

Detecting causal drivers and empirical prediction of the Indian Summer Monsoon

NASA Astrophysics Data System (ADS)

Di Capua, G.; Vellore, R.; Raghavan, K.; Coumou, D.

2017-12-01

The Indian summer monsoon (ISM) is crucial for the economy, society and natural ecosystems on the Indian peninsula. Predict the total seasonal rainfall at several months lead time would help to plan effective water management strategies, improve flood or drought protection programs and prevent humanitarian crisis. However, the complexity and strong internal variability of the ISM circulation system make skillful seasonal forecasting challenging. Moreover, to adequately identify the low-frequency, and far-away processes which influence ISM behavior novel tools are needed. We applied a Response-Guided Causal Precursor Detection (RGCPD) scheme, which is a novel empirical prediction method which unites a response-guided community detection scheme with a causal discovery algorithm (CEN). These tool allow us to assess causal pathways between different components of the ISM circulation system and with far-away regions in the tropics, mid-latitudes or Arctic. The scheme has successfully been used to identify causal precursors of the Stratospheric polar vortex enabling skillful predictions at (sub) seasonal timescales (Kretschmer et al. 2016, J.Clim., Kretschmer et al. 2017, GRL). We analyze observed ISM monthly rainfall over the monsoon trough region. Applying causal discovery techniques, we identify several causal precursor communities in the fields of 2m-temperature, sea level pressure and snow depth over Eurasia. Specifically, our results suggest that surface temperature conditions in both tropical and Arctic regions contribute to ISM variability. A linear regression prediction model based on the identified set of communities has good hindcasting skills with 4-5 months lead times. Further we separate El Nino, La Nina and ENSO-neutral years from each other and find that the causal precursors are different dependent on ENSO state. The ENSO-state dependent causal precursors give even higher skill, especially for La Nina years when the ISM is relatively strong. These findings are promising results that might ultimately contribute to both improved understanding of the ISM circulation system and help improving seasonal ISM forecasts.
Metaphor Identification in Large Texts Corpora

PubMed Central

Neuman, Yair; Assaf, Dan; Cohen, Yohai; Last, Mark; Argamon, Shlomo; Howard, Newton; Frieder, Ophir

2013-01-01

Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus. PMID:23658625
Concerted Earth Observation and Prediction of Water and Energy Cycles in the Third Pole Environment (CEOP-TPE)

NASA Astrophysics Data System (ADS)

Su, Bob; Ma, Yaoming; Menenti, Massimo; Wen, Jun; Sobrino, Jose; He, Yanbo; Li, Zhao-Liang; Tang, Bohui; Sneeuw, Nico; Zhong, Lei; Zeng, Yijian; van der Veld, Rogier; Chen, Xuelong; Zheng, Donghai; Huang, Ying; Lv, Shaoning; Wang, Lichun

2016-08-01

The achievements made in Dragon III in 2014-2016 are listed below:1. Maintaining the Tibetan Plateau Soil Moisture and Soil Temperature Observatory (Tibet-Obs) [1-3] and developing a method and data product by blending SM product over Tibetan Plateau and evaluating other available SM products [4].2. Developing a new algorithm for representing the effective soil temperature in microwave radiometry [5-7].3. Developing data sets to study the regional and plateau scale land-atmosphere interactions in TPE [8-11].4. Identifying and developing improved land surface processes [12-15].5. Developing a method for the quantification of water cycle components based on earth observation data and a comparison to reanalysis data [16-17].6. Investigating and revealing the mechanism of surface and tropospheric heatings on the Tibetan plateau [18].7. Proposing a validation framework for the generationof climate data records [19].8. Graduating seven young scientists with their doctorates during the last two years of Dragon III programme.9. Making the datasets and algorithms accessible to the scientific community.
Joint Multi-Leaf Segmentation, Alignment, and Tracking for Fluorescence Plant Videos.

PubMed

Yin, Xi; Liu, Xiaoming; Chen, Jin; Kramer, David M

2018-06-01

This paper proposes a novel framework for fluorescence plant video processing. The plant research community is interested in the leaf-level photosynthetic analysis within a plant. A prerequisite for such analysis is to segment all leaves, estimate their structures, and track them over time. We identify this as a joint multi-leaf segmentation, alignment, and tracking problem. First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates. Second, leaf tracking is applied on the remaining frames with leaf candidate transformation from the previous frame. We form two optimization problems with shared terms in their objective functions for leaf alignment and tracking respectively. A quantitative evaluation framework is formulated to evaluate the performance of our algorithm with four metrics. Two models are learned to predict the alignment accuracy and detect tracking failure respectively in order to provide guidance for subsequent plant biology analysis. The limitation of our algorithm is also studied. Experimental results show the effectiveness, efficiency, and robustness of the proposed method.
The applications of machine learning algorithms in the modeling of estrogen-like chemicals.

PubMed

Liu, Huanxiang; Yao, Xiaojun; Gramatica, Paola

2009-06-01

Increasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that, in the environment, are adversely affecting human and wildlife health through a variety of mechanisms, mainly estrogen receptor-mediated mechanisms of toxicity. Because of the large number of such chemicals in the environment, there is a great need for an effective means of rapidly assessing endocrine-disrupting activity in the toxicology assessment process. When faced with the challenging task of screening large libraries of molecules for biological activity, the benefits of computational predictive models based on quantitative structure-activity relationships to identify possible estrogens become immediately obvious. Recently, in order to improve the accuracy of prediction, some machine learning techniques were introduced to build more effective predictive models. In this review we will focus our attention on some recent advances in the use of these methods in modeling estrogen-like chemicals. The advantages and disadvantages of the machine learning algorithms used in solving this problem, the importance of the validation and performance assessment of the built models as well as their applicability domains will be discussed.
Identifying Physician-Recognized Depression from Administrative Data: Consequences for Quality Measurement

PubMed Central

Spettell, Claire M; Wall, Terry C; Allison, Jeroan; Calhoun, Jaimee; Kobylinski, Richard; Fargason, Rachel; Kiefe, Catarina I

2003-01-01

Background Multiple factors limit identification of patients with depression from administrative data. However, administrative data drives many quality measurement systems, including the Health Plan Employer Data and Information Set (HEDIS®). Methods We investigated two algorithms for identification of physician-recognized depression. The study sample was drawn from primary care physician member panels of a large managed care organization. All members were continuously enrolled between January 1 and December 31, 1997. Algorithm 1 required at least two criteria in any combination: (1) an outpatient diagnosis of depression or (2) a pharmacy claim for an antidepressant. Algorithm 2 included the same criteria as algorithm 1, but required a diagnosis of depression for all patients. With algorithm 1, we identified the medical records of a stratified, random subset of patients with and without depression (n=465). We also identified patients of primary care physicians with a minimum of 10 depressed members by algorithm 1 (n=32,819) and algorithm 2 (n=6,837). Results The sensitivity, specificity, and positive predictive values were: Algorithm 1: 95 percent, 65 percent, 49 percent; Algorithm 2: 52 percent, 88 percent, 60 percent. Compared to algorithm 1, profiles from algorithm 2 revealed higher rates of follow-up visits (43 percent, 55 percent) and appropriate antidepressant dosage acutely (82 percent, 90 percent) and chronically (83 percent, 91 percent) (p<0.05 for all). Conclusions Both algorithms had high false positive rates. Denominator construction (algorithm 1 versus 2) contributed significantly to variability in measured quality. Our findings raise concern about interpreting depression quality reports based upon administrative data. PMID:12968818
Algorithmic detectability threshold of the stochastic block model

NASA Astrophysics Data System (ADS)

Kawamoto, Tatsuro

2018-03-01

The assumption that the values of model parameters are known or correctly learned, i.e., the Nishimori condition, is one of the requirements for the detectability analysis of the stochastic block model in statistical inference. In practice, however, there is no example demonstrating that we can know the model parameters beforehand, and there is no guarantee that the model parameters can be learned accurately. In this study, we consider the expectation-maximization (EM) algorithm with belief propagation (BP) and derive its algorithmic detectability threshold. Our analysis is not restricted to the community structure but includes general modular structures. Because the algorithm cannot always learn the planted model parameters correctly, the algorithmic detectability threshold is qualitatively different from the one with the Nishimori condition.
Cloud computing-based TagSNP selection algorithm for human genome data.

PubMed

Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

2015-01-05

Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
Robust crop and weed segmentation under uncontrolled outdoor illumination.

PubMed

Jeon, Hong Y; Tian, Lei F; Zhu, Heping

2011-01-01

An image processing algorithm for detecting individual weeds was developed and evaluated. Weed detection processes included were normalized excessive green conversion, statistical threshold value estimation, adaptive image segmentation, median filter, morphological feature calculation and Artificial Neural Network (ANN). The developed algorithm was validated for its ability to identify and detect weeds and crop plants under uncontrolled outdoor illuminations. A machine vision implementing field robot captured field images under outdoor illuminations and the image processing algorithm automatically processed them without manual adjustment. The errors of the algorithm, when processing 666 field images, ranged from 2.1 to 2.9%. The ANN correctly detected 72.6% of crop plants from the identified plants, and considered the rest as weeds. However, the ANN identification rates for crop plants were improved up to 95.1% by addressing the error sources in the algorithm. The developed weed detection and image processing algorithm provides a novel method to identify plants against soil background under the uncontrolled outdoor illuminations, and to differentiate weeds from crop plants. Thus, the proposed new machine vision and processing algorithm may be useful for outdoor applications including plant specific direct applications (PSDA).
Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

PubMed Central

Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

2015-01-01

Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088
Constructing Temporally Extended Actions through Incremental Community Detection

PubMed Central

Li, Ge

2018-01-01

Hierarchical reinforcement learning works on temporally extended actions or skills to facilitate learning. How to automatically form such abstraction is challenging, and many efforts tackle this issue in the options framework. While various approaches exist to construct options from different perspectives, few of them concentrate on options' adaptability during learning. This paper presents an algorithm to create options and enhance their quality online. Both aspects operate on detected communities of the learning environment's state transition graph. We first construct options from initial samples as the basis of online learning. Then a rule-based community revision algorithm is proposed to update graph partitions, based on which existing options can be continuously tuned. Experimental results in two problems indicate that options from initial samples may perform poorly in more complex environments, and our presented strategy can effectively improve options and get better results compared with flat reinforcement learning. PMID:29849543
A Review of Power Distribution Test Feeders in the United States and the Need for Synthetic Representative Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Postigo Marcos, Fernando E.; Domingo, Carlos Mateo; San Roman, Tomas Gomez

Under the increasing penetration of distributed energy resources and new smart network technologies, distribution utilities face new challenges and opportunities to ensure reliable operations, manage service quality, and reduce operational and investment costs. Simultaneously, the research community is developing algorithms for advanced controls and distribution automation that can help to address some of these challenges. However, there is a shortage of realistic test systems that are publically available for development, testing, and evaluation of such new algorithms. Concerns around revealing critical infrastructure details and customer privacy have severely limited the number of actual networks published and that are available formore » testing. In recent decades, several distribution test feeders and US-featured representative networks have been published, but the scale, complexity, and control data vary widely. This paper presents a first-of-a-kind structured literature review of published distribution test networks with a special emphasis on classifying their main characteristics and identifying the types of studies for which they have been used. As a result, this both aids researchers in choosing suitable test networks for their needs and highlights the opportunities and directions for further test system development. In particular, we highlight the need for building large-scale synthetic networks to overcome the identified drawbacks of current distribution test feeders.« less
A Review of Power Distribution Test Feeders in the United States and the Need for Synthetic Representative Networks

DOE PAGES

Postigo Marcos, Fernando E.; Domingo, Carlos Mateo; San Roman, Tomas Gomez; ...

2017-11-18

Under the increasing penetration of distributed energy resources and new smart network technologies, distribution utilities face new challenges and opportunities to ensure reliable operations, manage service quality, and reduce operational and investment costs. Simultaneously, the research community is developing algorithms for advanced controls and distribution automation that can help to address some of these challenges. However, there is a shortage of realistic test systems that are publically available for development, testing, and evaluation of such new algorithms. Concerns around revealing critical infrastructure details and customer privacy have severely limited the number of actual networks published and that are available formore » testing. In recent decades, several distribution test feeders and US-featured representative networks have been published, but the scale, complexity, and control data vary widely. This paper presents a first-of-a-kind structured literature review of published distribution test networks with a special emphasis on classifying their main characteristics and identifying the types of studies for which they have been used. As a result, this both aids researchers in choosing suitable test networks for their needs and highlights the opportunities and directions for further test system development. In particular, we highlight the need for building large-scale synthetic networks to overcome the identified drawbacks of current distribution test feeders.« less
A mathematical programming approach for sequential clustering of dynamic networks

NASA Astrophysics Data System (ADS)

Silva, Jonathan C.; Bennett, Laura; Papageorgiou, Lazaros G.; Tsoka, Sophia

2016-02-01

A common analysis performed on dynamic networks is community structure detection, a challenging problem that aims to track the temporal evolution of network modules. An emerging area in this field is evolutionary clustering, where the community structure of a network snapshot is identified by taking into account both its current state as well as previous time points. Based on this concept, we have developed a mixed integer non-linear programming (MINLP) model, SeqMod, that sequentially clusters each snapshot of a dynamic network. The modularity metric is used to determine the quality of community structure of the current snapshot and the historical cost is accounted for by optimising the number of node pairs co-clustered at the previous time point that remain so in the current snapshot partition. Our method is tested on social networks of interactions among high school students, college students and members of the Brazilian Congress. We show that, for an adequate parameter setting, our algorithm detects the classes that these students belong more accurately than partitioning each time step individually or by partitioning the aggregated snapshots. Our method also detects drastic discontinuities in interaction patterns across network snapshots. Finally, we present comparative results with similar community detection methods for time-dependent networks from the literature. Overall, we illustrate the applicability of mathematical programming as a flexible, adaptable and systematic approach for these community detection problems. Contribution to the Topical Issue "Temporal Network Theory and Applications", edited by Petter Holme.
Identifying patients with ischemic heart disease in an electronic medical record.

PubMed

Ivers, Noah; Pylypenko, Bogdan; Tu, Karen

2011-01-01

Increasing utilization of electronic medical records (EMRs) presents an opportunity to efficiently measure quality indicators in primary care. Achieving this goal requires the development of accurate patient-disease registries. This study aimed to develop and validate an algorithm for identifying patients with ischemic heart disease (IHD) within the EMR. An algorithm was developed to search the unstructured text within the medical history fields in the EMR for IHD-related terminology. This algorithm was applied to a 5% random sample of adult patient charts (n = 969) drawn from a convenience sample of 17 Ontario family physicians. The accuracy of the algorithm for identifying patients with IHD was compared to the results of 3 trained chart abstractors. The manual chart abstraction identified 87 patients with IHD in the random sample (prevalence = 8.98%). The accuracy of the algorithm for identifying patients with IHD was as follows: sensitivity = 72.4% (95% confidence interval [CI]: 61.8-81.5); specificity = 99.3% (95% CI: 98.5-99.8); positive predictive value = 91.3% (95% CI: 82.0-96.7); negative predictive value = 97.3 (95% CI: 96.1-98.3); and kappa = 0.79 (95% CI: 0.72-0.86). Patients with IHD can be accurately identified by applying a search algorithm for the medical history fields in the EMR of primary care providers who were not using standardized approaches to code diagnoses. The accuracy compares favorably to other methods for identifying patients with IHD. The results of this study may aid policy makers, researchers, and clinicians to develop registries and to examine quality indicators for IHD in primary care.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.

PubMed

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering

PubMed Central

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Validity of administrative database code algorithms to identify vascular access placement, surgical revisions, and secondary patency.

PubMed

Al-Jaishi, Ahmed A; Moist, Louise M; Oliver, Matthew J; Nash, Danielle M; Fleet, Jamie L; Garg, Amit X; Lok, Charmaine E

2018-03-01

We assessed the validity of physician billing codes and hospital admission using International Classification of Diseases 10th revision codes to identify vascular access placement, secondary patency, and surgical revisions in administrative data. We included adults (≥18 years) with a vascular access placed between 1 April 2004 and 31 March 2013 at the University Health Network, Toronto. Our reference standard was a prospective vascular access database (VASPRO) that contains information on vascular access type and dates of placement, dates for failure, and any revisions. We used VASPRO to assess the validity of different administrative coding algorithms by calculating the sensitivity, specificity, and positive predictive values of vascular access events. The sensitivity (95% confidence interval) of the best performing algorithm to identify arteriovenous access placement was 86% (83%, 89%) and specificity was 92% (89%, 93%). The corresponding numbers to identify catheter insertion were 84% (82%, 86%) and 84% (80%, 87%), respectively. The sensitivity of the best performing coding algorithm to identify arteriovenous access surgical revisions was 81% (67%, 90%) and specificity was 89% (87%, 90%). The algorithm capturing arteriovenous access placement and catheter insertion had a positive predictive value greater than 90% and arteriovenous access surgical revisions had a positive predictive value of 20%. The duration of arteriovenous access secondary patency was on average 578 (553, 603) days in VASPRO and 555 (530, 580) days in administrative databases. Administrative data algorithms have fair to good operating characteristics to identify vascular access placement and arteriovenous access secondary patency. Low positive predictive values for surgical revisions algorithm suggest that administrative data should only be used to rule out the occurrence of an event.
System and method for resolving gamma-ray spectra

DOEpatents

Gentile, Charles A.; Perry, Jason; Langish, Stephen W.; Silber, Kenneth; Davis, William M.; Mastrovito, Dana

2010-05-04

A system for identifying radionuclide emissions is described. The system includes at least one processor for processing output signals from a radionuclide detecting device, at least one training algorithm run by the at least one processor for analyzing data derived from at least one set of known sample data from the output signals, at least one classification algorithm derived from the training algorithm for classifying unknown sample data, wherein the at least one training algorithm analyzes the at least one sample data set to derive at least one rule used by said classification algorithm for identifying at least one radionuclide emission detected by the detecting device.
When drug discovery meets web search: Learning to Rank for ligand-based virtual screening.

PubMed

Zhang, Wei; Ji, Lijuan; Chen, Yanan; Tang, Kailin; Wang, Haiping; Zhu, Ruixin; Jia, Wei; Cao, Zhiwei; Liu, Qi

2015-01-01

The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms. A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration. To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. Graphical AbstractThe analogy between web search and ligand-based drug discovery.

Core-periphery structure requires something else in the network

NASA Astrophysics Data System (ADS)

Kojaku, Sadamori; Masuda, Naoki

2018-04-01

A network with core-periphery structure consists of core nodes that are densely interconnected. In contrast to a community structure, which is a different meso-scale structure of networks, core nodes can be connected to peripheral nodes and peripheral nodes are not densely interconnected. Although core-periphery structure sounds reasonable, we argue that it is merely accounted for by heterogeneous degree distributions, if one partitions a network into a single core block and a single periphery block, which the famous Borgatti–Everett algorithm and many succeeding algorithms assume. In other words, there is a strong tendency that high-degree and low-degree nodes are judged to be core and peripheral nodes, respectively. To discuss core-periphery structure beyond the expectation of the node’s degree (as described by the configuration model), we propose that one needs to assume at least one block of nodes apart from the focal core-periphery structure, such as a different core-periphery pair, community or nodes not belonging to any meso-scale structure. We propose a scalable algorithm to detect pairs of core and periphery in networks, controlling for the effect of the node’s degree. We illustrate our algorithm using various empirical networks.
Classifying spatially heterogeneous wetland communities using machine learning algorithms and spectral and textural features.

PubMed

Szantoi, Zoltan; Escobedo, Francisco J; Abd-Elrahman, Amr; Pearlstine, Leonard; Dewitt, Bon; Smith, Scot

2015-05-01

Mapping of wetlands (marsh vs. swamp vs. upland) is a common remote sensing application.Yet, discriminating between similar freshwater communities such as graminoid/sedge fromremotely sensed imagery is more difficult. Most of this activity has been performed using medium to low resolution imagery. There are only a few studies using highspatial resolutionimagery and machine learning image classification algorithms for mapping heterogeneouswetland plantcommunities. This study addresses this void by analyzing whether machine learning classifierssuch as decisiontrees (DT) and artificial neural networks (ANN) can accurately classify graminoid/sedgecommunities usinghigh resolution aerial imagery and image texture data in the Everglades National Park, Florida.In addition tospectral bands, the normalized difference vegetation index, and first- and second-order texturefeatures derivedfrom the near-infrared band were analyzed. Classifier accuracies were assessed using confusiontablesand the calculated kappa coefficients of the resulting maps. The results indicated that an ANN(multilayerperceptron based on backpropagation) algorithm produced a statistically significantly higheraccuracy(82.04%) than the DT (QUEST) algorithm (80.48%) or the maximum likelihood (80.56%)classifier (α<0.05). Findings show that using multiple window sizes provided the best results. First-ordertexture featuresalso provided computational advantages and results that were not significantly different fromthose usingsecond-order texture features.
Community detection for networks with unipartite and bipartite structure

NASA Astrophysics Data System (ADS)

Chang, Chang; Tang, Chao

2014-09-01

Finding community structures in networks is important in network science, technology, and applications. To date, most algorithms that aim to find community structures only focus either on unipartite or bipartite networks. A unipartite network consists of one set of nodes and a bipartite network consists of two nonoverlapping sets of nodes with only links joining the nodes in different sets. However, a third type of network exists, defined here as the mixture network. Just like a bipartite network, a mixture network also consists of two sets of nodes, but some nodes may simultaneously belong to two sets, which breaks the nonoverlapping restriction of a bipartite network. The mixture network can be considered as a general case, with unipartite and bipartite networks viewed as its limiting cases. A mixture network can represent not only all the unipartite and bipartite networks, but also a wide range of real-world networks that cannot be properly represented as either unipartite or bipartite networks in fields such as biology and social science. Based on this observation, we first propose a probabilistic model that can find modules in unipartite, bipartite, and mixture networks in a unified framework based on the link community model for a unipartite undirected network [B Ball et al (2011 Phys. Rev. E 84 036103)]. We test our algorithm on synthetic networks (both overlapping and nonoverlapping communities) and apply it to two real-world networks: a southern women bipartite network and a human transcriptional regulatory mixture network. The results suggest that our model performs well for all three types of networks, is competitive with other algorithms for unipartite or bipartite networks, and is applicable to real-world networks.
A systematic review of interventions to improve diabetes care in socially disadvantaged populations.

PubMed

Glazier, Richard H; Bajcar, Jana; Kennie, Natalie R; Willson, Kristie

2006-07-01

To identify and synthesize evidence about the effectiveness of patient, provider, and health system interventions to improve diabetes care among socially disadvantaged populations. Studies that were included targeted interventions toward socially disadvantaged adults with type 1 or type 2 diabetes; were conducted in industrialized countries; were measured outcomes of self-management, provider management, or clinical outcomes; and were randomized controlled trials, controlled trials, or before-and-after studies with a contemporaneous control group. Seven databases were searched for articles published in any language between January 1986 and December 2004. Twenty-six intervention features were identified and analyzed in terms of their association with successful or unsuccessful interventions. Eleven of 17 studies that met inclusion criteria had positive results. Features that appeared to have the most consistent positive effects included cultural tailoring of the intervention, community educators or lay people leading the intervention, one-on-one interventions with individualized assessment and reassessment, incorporating treatment algorithms, focusing on behavior-related tasks, providing feedback, and high-intensity interventions (>10 contact times) delivered over a long duration (>or=6 months). Interventions that were consistently associated with the largest negative outcomes included those that used mainly didactic teaching or that focused only on diabetes knowledge. This systematic review provides evidence for the effectiveness of interventions to improve diabetes care among socially disadvantaged populations and identifies key intervention features that may predict success. These types of interventions would require additional resources for needs assessment, leader training, community and family outreach, and follow-up.
Dynamic Systems for Individual Tracking via Heterogeneous Information Integration and Crowd Source Distributed Simulation

DTIC Science & Technology

2015-12-04

51 6.6 Power Consumption: Communications ...simulations executing on mobile computing platforms, an area not widely studied to date in the distributed simulation research community . A...simulation community . These initial studies focused on two conservative synchronization algorithms widely used in the distributed simulation field
SENSITIVITY OF OZONE AND AEROSOL PREDICTIONS TO THE TRANSPORT ALGORITHMS IN THE MODELS-3 COMMUNITY MULTI-SCALE AIR QUALITY (CMAQ) MODELING SYSTEM

EPA Science Inventory

EPA's Models-3 CMAQ system is intended to provide a community modeling paradigm that allows continuous improvement of the one-atmosphere modeling capability in a unified fashion. CMAQ's modular design promotes incorporation of several sets of science process modules representing ...
Sampling Approaches for Multi-Domain Internet Performance Measurement Infrastructures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Calyam, Prasad

2014-09-15

The next-generation of high-performance networks being developed in DOE communities are critical for supporting current and emerging data-intensive science applications. The goal of this project is to investigate multi-domain network status sampling techniques and tools to measure/analyze performance, and thereby provide “network awareness” to end-users and network operators in DOE communities. We leverage the infrastructure and datasets available through perfSONAR, which is a multi-domain measurement framework that has been widely deployed in high-performance computing and networking communities; the DOE community is a core developer and the largest adopter of perfSONAR. Our investigations include development of semantic scheduling algorithms, measurement federationmore » policies, and tools to sample multi-domain and multi-layer network status within perfSONAR deployments. We validate our algorithms and policies with end-to-end measurement analysis tools for various monitoring objectives such as network weather forecasting, anomaly detection, and fault-diagnosis. In addition, we develop a multi-domain architecture for an enterprise-specific perfSONAR deployment that can implement monitoring-objective based sampling and that adheres to any domain-specific measurement policies.« less
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

DOE PAGES

Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun

2017-10-30

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
LensFlow: A Convolutional Neural Network in Search of Strong Gravitational Lenses

NASA Astrophysics Data System (ADS)

Pourrahmani, Milad; Nayyeri, Hooshang; Cooray, Asantha

2018-03-01

In this work, we present our machine learning classification algorithm for identifying strong gravitational lenses from wide-area surveys using convolutional neural networks; LENSFLOW. We train and test the algorithm using a wide variety of strong gravitational lens configurations from simulations of lensing events. Images are processed through multiple convolutional layers that extract feature maps necessary to assign a lens probability to each image. LENSFLOW provides a ranking scheme for all sources that could be used to identify potential gravitational lens candidates by significantly reducing the number of images that have to be visually inspected. We apply our algorithm to the HST/ACS i-band observations of the COSMOS field and present our sample of identified lensing candidates. The developed machine learning algorithm is more computationally efficient and complimentary to classical lens identification algorithms and is ideal for discovering such events across wide areas from current and future surveys such as LSST and WFIRST.
Data Products From Particle Detectors On-Board NOAA's Newest Space Weather Monitor

NASA Astrophysics Data System (ADS)

Kress, B. T.; Rodriguez, J. V.; Onsager, T. G.

2017-12-01

NOAA's newest Geostationary Operational Environmental Satellite, GOES-16, was launched on 19 November 2016. Instrumentation on-board GOES-16 includes the new Space Environment In-Situ Suite (SEISS), which has been collecting data since 8 January 2017. SEISS is composed of five magnetospheric particle sensor units: an electrostatic analyzer for measuring 30 eV - 30 keV ions and electrons (MPS-LO), a high energy particle sensor (MPS-HI) that measures keV to MeV electrons and protons, east and west facing Solar and Galactic Proton Sensor (SGPS) units with 13 differential channels between 1-500 MeV, and an Energetic Heavy Ion Sensor (EHIS) that measures 30 species of heavy ions (He-Ni) in five energy bands in the 10-200 MeV/nuc range. Measurement of low energy magnetospheric particles by MPS-LO and heavy ions by EHIS are new capabilities not previously flown on the GOES system. Real-time data from GOES-16 will support space weather monitoring and first-principles space weather modeling by NOAA's Space Weather Prediction Center (SWPC). Space weather level 2+ data products under development at NOAA's National Centers for Environmental Information (NCEI) include the Solar Energetic Particle (SEP) Event Detection algorithm. Legacy components of the SEP event detection algorithm (currently produced by SWPC) include the Solar Radiation Storm Scales. New components will include, e.g., event fluences. New level 2+ data products also include the SEP event Linear Energy Transfer (LET) Algorithm, for transforming energy spectra from EHIS into LET spectra, and the Density and Temperature Moments and Spacecraft Charging algorithm. The moments and charging algorithm identifies electron and ion signatures of spacecraft surface (frame) charging in the MPS-LO fluxes. Densities and temperatures from MPS-LO will also be used to support a magnetopause crossing detection algorithm. The new data products will provide real-time indicators of potential radiation hazards for the satellite community and data for future studies of space weather effects. This presentation will include an overview of these algorithms and examples of their performance during recent co-rotation interaction region (CIR) associated radiation belt enhancements and a solar particle event on 14-15 July 2017.
Testing the accuracy of redshift-space group-finding algorithms

NASA Astrophysics Data System (ADS)

Frederic, James J.

1995-04-01

Using simulated redshift surveys generated from a high-resolution N-body cosmological structure simulation, we study algorithms used to identify groups of galaxies in redshift space. Two algorithms are investigated; both are friends-of-friends schemes with variable linking lengths in the radial and transverse dimenisons. The chief difference between the algorithms is in the redshift linking length. The algorithm proposed by Huchra & Geller (1982) uses a generous linking length designed to find 'fingers of god,' while that of Nolthenius & White (1987) uses a smaller linking length to minimize contamination by projection. We find that neither of the algorithms studied is intrinsically superior to the other; rather, the ideal algorithm as well as the ideal algorithm parameters depends on the purpose for which groups are to be studied. The Huchra & Geller algorithm misses few real groups, at the cost of including some spurious groups and members, while the Nolthenius & White algorithm misses high velocity dispersion groups and members but is less likely to include interlopers in its group assignments. Adjusting the parameters of either algorithm results in a trade-off between group accuracy and completeness. In a companion paper we investigate the accuracy of virial mass estimates and clustering properties of groups identified using these algorithms.
Statistically significant relational data mining :

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less
RNA design rules from a massive open laboratory

PubMed Central

Lee, Jeehyung; Kladwang, Wipapat; Lee, Minjae; Cantu, Daniel; Azizyan, Martin; Kim, Hanjoo; Limpaecher, Alex; Gaikwad, Snehal; Yoon, Sungroh; Treuille, Adrien; Das, Rhiju

2014-01-01

Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models—even at the secondary structure level—hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies—including several previously unrecognized negative design rules—were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science. PMID:24469816
The Langley Parameterized Shortwave Algorithm (LPSA) for Surface Radiation Budget Studies. 1.0

NASA Technical Reports Server (NTRS)

Gupta, Shashi K.; Kratz, David P.; Stackhouse, Paul W., Jr.; Wilber, Anne C.

2001-01-01

An efficient algorithm was developed during the late 1980's and early 1990's by W. F. Staylor at NASA/LaRC for the purpose of deriving shortwave surface radiation budget parameters on a global scale. While the algorithm produced results in good agreement with observations, the lack of proper documentation resulted in a weak acceptance by the science community. The primary purpose of this report is to develop detailed documentation of the algorithm. In the process, the algorithm was modified whenever discrepancies were found between the algorithm and its referenced literature sources. In some instances, assumptions made in the algorithm could not be justified and were replaced with those that were justifiable. The algorithm uses satellite and operational meteorological data for inputs. Most of the original data sources have been replaced by more recent, higher quality data sources, and fluxes are now computed on a higher spatial resolution. Many more changes to the basic radiation scheme and meteorological inputs have been proposed to improve the algorithm and make the product more useful for new research projects. Because of the many changes already in place and more planned for the future, the algorithm has been renamed the Langley Parameterized Shortwave Algorithm (LPSA).
Finding Frequent Closed Itemsets in Sliding Window in Linear Time

NASA Astrophysics Data System (ADS)

Chen, Junbo; Zhou, Bo; Chen, Lu; Wang, Xinyu; Ding, Yiqun

One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+[6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.
InSAR Scientific Computing Environment

NASA Technical Reports Server (NTRS)

Rosen, Paul A.; Sacco, Gian Franco; Gurrola, Eric M.; Zabker, Howard A.

2011-01-01

This computing environment is the next generation of geodetic image processing technology for repeat-pass Interferometric Synthetic Aperture (InSAR) sensors, identified by the community as a needed capability to provide flexibility and extensibility in reducing measurements from radar satellites and aircraft to new geophysical products. This software allows users of interferometric radar data the flexibility to process from Level 0 to Level 4 products using a variety of algorithms and for a range of available sensors. There are many radar satellites in orbit today delivering to the science community data of unprecedented quantity and quality, making possible large-scale studies in climate research, natural hazards, and the Earth's ecosystem. The proposed DESDynI mission, now under consideration by NASA for launch later in this decade, would provide time series and multiimage measurements that permit 4D models of Earth surface processes so that, for example, climate-induced changes over time would become apparent and quantifiable. This advanced data processing technology, applied to a global data set such as from the proposed DESDynI mission, enables a new class of analyses at time and spatial scales unavailable using current approaches. This software implements an accurate, extensible, and modular processing system designed to realize the full potential of InSAR data from future missions such as the proposed DESDynI, existing radar satellite data, as well as data from the NASA UAVSAR (Uninhabited Aerial Vehicle Synthetic Aperture Radar), and other airborne platforms. The processing approach has been re-thought in order to enable multi-scene analysis by adding new algorithms and data interfaces, to permit user-reconfigurable operation and extensibility, and to capitalize on codes already developed by NASA and the science community. The framework incorporates modern programming methods based on recent research, including object-oriented scripts controlling legacy and new codes, abstraction and generalization of the data model for efficient manipulation of objects among modules, and well-designed module interfaces suitable for command- line execution or GUI-programming. The framework is designed to allow users contributions to promote maximum utility and sophistication of the code, creating an open-source community that could extend the framework into the indefinite future.
Identification of periods of clear sky irradiance in time series of GHI measurements

DOE PAGES

Reno, Matthew J.; Hansen, Clifford W.

2016-01-18

In this study, we present a simple algorithm for identifying periods of time with broadband global horizontal irradiance (GHI) similar to that occurring during clear sky conditions from a time series of GHI measurements. Other available methods to identify these periods do so by identifying periods with clear sky conditions using additional measurements, such as direct or diffuse irradiance. Our algorithm compares characteristics of the time series of measured GHI with the output of a clear sky model without requiring additional measurements. We validate our algorithm using data from several locations by comparing our results with those obtained from amore » clear sky detection algorithm, and with satellite and ground-based sky imagery.« less
Identification of periods of clear sky irradiance in time series of GHI measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reno, Matthew J.; Hansen, Clifford W.

In this study, we present a simple algorithm for identifying periods of time with broadband global horizontal irradiance (GHI) similar to that occurring during clear sky conditions from a time series of GHI measurements. Other available methods to identify these periods do so by identifying periods with clear sky conditions using additional measurements, such as direct or diffuse irradiance. Our algorithm compares characteristics of the time series of measured GHI with the output of a clear sky model without requiring additional measurements. We validate our algorithm using data from several locations by comparing our results with those obtained from amore » clear sky detection algorithm, and with satellite and ground-based sky imagery.« less
Using entropy to cut complex time series

NASA Astrophysics Data System (ADS)

Mertens, David; Poncela Casasnovas, Julia; Spring, Bonnie; Amaral, L. A. N.

2013-03-01

Using techniques from statistical physics, physicists have modeled and analyzed human phenomena varying from academic citation rates to disease spreading to vehicular traffic jams. The last decade's explosion of digital information and the growing ubiquity of smartphones has led to a wealth of human self-reported data. This wealth of data comes at a cost, including non-uniform sampling and statistically significant but physically insignificant correlations. In this talk I present our work using entropy to identify stationary sub-sequences of self-reported human weight from a weight management web site. Our entropic approach-inspired by the infomap network community detection algorithm-is far less biased by rare fluctuations than more traditional time series segmentation techniques. Supported by the Howard Hughes Medical Institute

Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Use of administrative and electronic health record data for development of automated algorithms for childhood diabetes case ascertainment and type classification: the SEARCH for Diabetes in Youth Study

PubMed Central

Zhong, Victor W.; Pfaff, Emily R.; Beavers, Daniel P.; Thomas, Joan; Jaacks, Lindsay M.; Bowlby, Deborah A.; Carey, Timothy S.; Lawrence, Jean M.; Dabelea, Dana; Hamman, Richard F.; Pihoker, Catherine; Saydah, Sharon H.; Mayer-Davis, Elizabeth J.

2014-01-01

Background The performance of automated algorithms for childhood diabetes case ascertainment and type classification may differ by demographic characteristics. Objective This study evaluated the potential of administrative and electronic health record (EHR) data from a large academic care delivery system to conduct diabetes case ascertainment in youth according to type, age and race/ethnicity. Subjects 57,767 children aged <20 years as of December 31, 2011 seen at University of North Carolina Health Care System in 2011 were included. Methods Using an initial algorithm including billing data, patient problem lists, laboratory test results and diabetes related medications between July 1, 2008 and December 31, 2011, presumptive cases were identified and validated by chart review. More refined algorithms were evaluated by type (type 1 versus type 2), age (<10 versus ≥10 years) and race/ethnicity (non-Hispanic white versus “other”). Sensitivity, specificity and positive predictive value were calculated and compared. Results The best algorithm for ascertainment of diabetes cases overall was billing data. The best type 1 algorithm was the ratio of the number of type 1 billing codes to the sum of type 1 and type 2 billing codes ≥0.5. A useful algorithm to ascertain type 2 youth with “other” race/ethnicity was identified. Considerable age and racial/ethnic differences were present in type-non-specific and type 2 algorithms. Conclusions Administrative and EHR data may be used to identify cases of childhood diabetes (any type), and to identify type 1 cases. The performance of type 2 case ascertainment algorithms differed substantially by race/ethnicity. PMID:24913103
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters

NASA Astrophysics Data System (ADS)

Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.

2018-06-01

We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
An Automated Summarization Assessment Algorithm for Identifying Summarizing Strategies

PubMed Central

Abdi, Asad; Idris, Norisma; Alguliyev, Rasim M.; Aliguliyev, Ramiz M.

2016-01-01

Background Summarization is a process to select important information from a source text. Summarizing strategies are the core cognitive processes in summarization activity. Since summarization can be important as a tool to improve comprehension, it has attracted interest of teachers for teaching summary writing through direct instruction. To do this, they need to review and assess the students' summaries and these tasks are very time-consuming. Thus, a computer-assisted assessment can be used to help teachers to conduct this task more effectively. Design/Results This paper aims to propose an algorithm based on the combination of semantic relations between words and their syntactic composition to identify summarizing strategies employed by students in summary writing. An innovative aspect of our algorithm lies in its ability to identify summarizing strategies at the syntactic and semantic levels. The efficiency of the algorithm is measured in terms of Precision, Recall and F-measure. We then implemented the algorithm for the automated summarization assessment system that can be used to identify the summarizing strategies used by students in summary writing. PMID:26735139
Robust Crop and Weed Segmentation under Uncontrolled Outdoor Illumination

PubMed Central

Jeon, Hong Y.; Tian, Lei F.; Zhu, Heping

2011-01-01

An image processing algorithm for detecting individual weeds was developed and evaluated. Weed detection processes included were normalized excessive green conversion, statistical threshold value estimation, adaptive image segmentation, median filter, morphological feature calculation and Artificial Neural Network (ANN). The developed algorithm was validated for its ability to identify and detect weeds and crop plants under uncontrolled outdoor illuminations. A machine vision implementing field robot captured field images under outdoor illuminations and the image processing algorithm automatically processed them without manual adjustment. The errors of the algorithm, when processing 666 field images, ranged from 2.1 to 2.9%. The ANN correctly detected 72.6% of crop plants from the identified plants, and considered the rest as weeds. However, the ANN identification rates for crop plants were improved up to 95.1% by addressing the error sources in the algorithm. The developed weed detection and image processing algorithm provides a novel method to identify plants against soil background under the uncontrolled outdoor illuminations, and to differentiate weeds from crop plants. Thus, the proposed new machine vision and processing algorithm may be useful for outdoor applications including plant specific direct applications (PSDA). PMID:22163954
Evaluation of the CDC proposed laboratory HIV testing algorithm among men who have sex with men (MSM) from five US metropolitan statistical areas using specimens collected in 2011.

PubMed

Masciotra, Silvina; Smith, Amanda J; Youngpairoj, Ae S; Sprinkle, Patrick; Miles, Isa; Sionean, Catlainn; Paz-Bailey, Gabriela; Johnson, Jeffrey A; Owen, S Michele

2013-12-01

Until recently most testing algorithms in the United States (US) utilized Western blot (WB) as the supplemental test. CDC has proposed an algorithm for HIV diagnosis which includes an initial screen with a Combo Antigen/Antibody 4th generation-immunoassay (IA), followed by an HIV-1/2 discriminatory IA of initially reactive-IA specimens. Discordant results in the proposed algorithm are resolved by nucleic acid-amplification testing (NAAT). Evaluate the results obtained with the CDC proposed laboratory-based algorithm using specimens from men who have sex with men (MSM) obtained in five metropolitan statistical areas (MSAs). Specimens from 992 MSM from five MSAs participating in the CDC's National HIV Behavioral Surveillance System in 2011 were tested at local facilities and CDC. The five MSAs utilized algorithms of various screening assays and specimen types, and WB as the supplemental test. At the CDC, serum/plasma specimens were screened with 4th generation-IA and the Multispot HIV-1/HIV-2 discriminatory assay was used as the supplemental test. NAAT was used to resolve discordant results and to further identify acute HIV infections from all screened-non-reactive missed by the proposed algorithm. Performance of the proposed algorithm was compared to site-specific WB-based algorithms. The proposed algorithm detected 254 infections. The WB-based algorithms detected 19 fewer infections; 4 by oral fluid (OF) rapid testing and 15 by WB supplemental testing (12 OF and 3 blood). One acute infection was identified by NAAT from all screened-non-reactive specimens. The proposed algorithm identified more infections than the WB-based algorithms in a high-risk MSM population. OF testing was associated with most of the discordant results between algorithms. HIV testing with the proposed algorithm can increase diagnosis of infected individuals, including early infections. Published by Elsevier B.V.
Sentiment analysis enhancement with target variable in Kumar’s Algorithm

NASA Astrophysics Data System (ADS)

Arman, A. A.; Kawi, A. B.; Hurriyati, R.

2016-04-01

Sentiment analysis (also known as opinion mining) refers to the use of text analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews discussion that is being talked in social media for many purposes, ranging from marketing, customer service, or public opinion of public policy. One of the popular algorithm for Sentiment Analysis implementation is Kumar algorithm that developed by Kumar and Sebastian. Kumar algorithm can identify the sentiment score of the statement, sentence or tweet, but cannot determine the relationship of the object or target related to the sentiment being analysed. This research proposed solution for that challenge by adding additional component that represent object or target to the existing algorithm (Kumar algorithm). The result of this research is a modified algorithm that can give sentiment score based on a given object or target.
Distinctive Behaviors of Druggable Proteins in Cellular Networks

PubMed Central

Workman, Paul; Al-Lazikani, Bissan

2015-01-01

The interaction environment of a protein in a cellular network is important in defining the role that the protein plays in the system as a whole, and thus its potential suitability as a drug target. Despite the importance of the network environment, it is neglected during target selection for drug discovery. Here, we present the first systematic, comprehensive computational analysis of topological, community and graphical network parameters of the human interactome and identify discriminatory network patterns that strongly distinguish drug targets from the interactome as a whole. Importantly, we identify striking differences in the network behavior of targets of cancer drugs versus targets from other therapeutic areas and explore how they may relate to successful drug combinations to overcome acquired resistance to cancer drugs. We develop, computationally validate and provide the first public domain predictive algorithm for identifying druggable neighborhoods based on network parameters. We also make available full predictions for 13,345 proteins to aid target selection for drug discovery. All target predictions are available through canSAR.icr.ac.uk. Underlying data and tools are available at https://cansar.icr.ac.uk/cansar/publications/druggable_network_neighbourhoods/. PMID:26699810
Social Circles Detection from Ego Network and Profile Information

DTIC Science & Technology

2014-12-19

response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing... algorithm used to infer k-clique communities is expo- nential, which makes this technique unfeasible when treating egonets with a large number of users...atic when considering RBMs. This inconvenient was positively solved implementing a sparsity treatment with the RBM algorithm . (ii) The ground truth was
Fall 2014 SEI Research Review Edge-Enabled Tactical Systems (EETS)

DTIC Science & Technology

2014-10-29

Effective communicate and reasoning despite connectivity issues • More generally, how to make programming distributed algorithms with extensible...distributed collaboration in VREP simulations for 5-12 quadcopters and ground robots • Open-source middleware and algorithms released to community...Integration into CMU Drone-RK quadcopter and Platypus autonomous boat platforms • Presentations at DARPA (CODE), AFRL C4I Workshop, and AFRL Eglin
Communication Avoiding and Overlapping for Numerical Linear Algebra

DTIC Science & Technology

2012-05-08

future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing...linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve...will continue to grow relative to the cost of computation. With exascale computing as the long-term goal, the community needs to develop techniques
Litigated Metal Clusters - Structures, Energy and Reactivity

DTIC Science & Technology

2016-04-01

projection superposition approximation ( PSA ) algorithm through a more careful consideration of how to calculate cross sections for elongated molecules...superposition approximation ( PSA ) is now complete. We have made it available free of charge to the scientific community on a dedicated website at UCSB. We...by AFOSR. We continued to improve the projection superposition approximation ( PSA ) algorithm through a more careful consideration of how to calculate
A simple algorithm for the identification of clinical COPD phenotypes.

PubMed

Burgel, Pierre-Régis; Paillasseur, Jean-Louis; Janssens, Wim; Piquet, Jacques; Ter Riet, Gerben; Garcia-Aymerich, Judith; Cosio, Borja; Bakke, Per; Puhan, Milo A; Langhammer, Arnulf; Alfageme, Inmaculada; Almagro, Pere; Ancochea, Julio; Celli, Bartolome R; Casanova, Ciro; de-Torres, Juan P; Decramer, Marc; Echazarreta, Andrés; Esteban, Cristobal; Gomez Punter, Rosa Mar; Han, MeiLan K; Johannessen, Ane; Kaiser, Bernhard; Lamprecht, Bernd; Lange, Peter; Leivseth, Linda; Marin, Jose M; Martin, Francis; Martinez-Camblor, Pablo; Miravitlles, Marc; Oga, Toru; Sofia Ramírez, Ana; Sin, Don D; Sobradillo, Patricia; Soler-Cataluña, Juan J; Turner, Alice M; Verdu Rivera, Francisco Javier; Soriano, Joan B; Roche, Nicolas

2017-11-01

This study aimed to identify simple rules for allocating chronic obstructive pulmonary disease (COPD) patients to clinical phenotypes identified by cluster analyses.Data from 2409 COPD patients of French/Belgian COPD cohorts were analysed using cluster analysis resulting in the identification of subgroups, for which clinical relevance was determined by comparing 3-year all-cause mortality. Classification and regression trees (CARTs) were used to develop an algorithm for allocating patients to these subgroups. This algorithm was tested in 3651 patients from the COPD Cohorts Collaborative International Assessment (3CIA) initiative.Cluster analysis identified five subgroups of COPD patients with different clinical characteristics (especially regarding severity of respiratory disease and the presence of cardiovascular comorbidities and diabetes). The CART-based algorithm indicated that the variables relevant for patient grouping differed markedly between patients with isolated respiratory disease (FEV 1 , dyspnoea grade) and those with multi-morbidity (dyspnoea grade, age, FEV 1 and body mass index). Application of this algorithm to the 3CIA cohorts confirmed that it identified subgroups of patients with different clinical characteristics, mortality rates (median, from 4% to 27%) and age at death (median, from 68 to 76 years).A simple algorithm, integrating respiratory characteristics and comorbidities, allowed the identification of clinically relevant COPD phenotypes. Copyright ©ERS 2017.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Solving the infeasible trust-region problem using approximations.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Renaud, John E.; Perez, Victor M.; Eldred, Michael Scott

2004-07-01

The use of optimization in engineering design has fueled the development of algorithms for specific engineering needs. When the simulations are expensive to evaluate or the outputs present some noise, the direct use of nonlinear optimizers is not advisable, since the optimization process will be expensive and may result in premature convergence. The use of approximations for both cases is an alternative investigated by many researchers including the authors. When approximations are present, a model management is required for proper convergence of the algorithm. In nonlinear programming, the use of trust-regions for globalization of a local algorithm has been provenmore » effective. The same approach has been used to manage the local move limits in sequential approximate optimization frameworks as in Alexandrov et al., Giunta and Eldred, Perez et al. , Rodriguez et al., etc. The experience in the mathematical community has shown that more effective algorithms can be obtained by the specific inclusion of the constraints (SQP type of algorithms) rather than by using a penalty function as in the augmented Lagrangian formulation. The presence of explicit constraints in the local problem bounded by the trust region, however, may have no feasible solution. In order to remedy this problem the mathematical community has developed different versions of a composite steps approach. This approach consists of a normal step to reduce the amount of constraint violation and a tangential step to minimize the objective function maintaining the level of constraint violation attained at the normal step. Two of the authors have developed a different approach for a sequential approximate optimization framework using homotopy ideas to relax the constraints. This algorithm called interior-point trust-region sequential approximate optimization (IPTRSAO) presents some similarities to the two normal-tangential steps algorithms. In this paper, a description of the similarities is presented and an expansion of the two steps algorithm is presented for the case of approximations.« less
Use of electronic data and existing screening tools to identify clinically significant obstructive sleep apnea.

PubMed

Severson, Carl A; Pendharkar, Sachin R; Ronksley, Paul E; Tsai, Willis H

2015-01-01

To assess the ability of electronic health data and existing screening tools to identify clinically significant obstructive sleep apnea (OSA), as defined by symptomatic or severe OSA. The present retrospective cohort study of 1041 patients referred for sleep diagnostic testing was undertaken at a tertiary sleep centre in Calgary, Alberta. A diagnosis of clinically significant OSA or an alternative sleep diagnosis was assigned to each patient through blinded independent chart review by two sleep physicians. Predictive variables were identified from online questionnaire data, and diagnostic algorithms were developed. The performance of electronically derived algorithms for identifying patients with clinically significant OSA was determined. Diagnostic performance of these algorithms was compared with versions of the STOP-Bang questionnaire and adjusted neck circumference score (ANC) derived from electronic data. Electronic questionnaire data were highly sensitive (>95%) at identifying clinically significant OSA, but not specific. Sleep diagnostic testing-determined respiratory disturbance index was very specific (specificity ≥95%) for clinically relevant disease, but not sensitive (<35%). Derived algorithms had similar accuracy to the STOP-Bang or ANC, but required fewer questions and calculations. These data suggest that a two-step process using a small number of clinical variables (maximizing sensitivity) and objective diagnostic testing (maximizing specificity) is required to identify clinically significant OSA. When used in an online setting, simple algorithms can identify clinically relevant OSA with similar performance to existing decision rules such as the STOP-Bang or ANC.
Network immunization under limited budget using graph spectra

NASA Astrophysics Data System (ADS)

Zahedi, R.; Khansari, M.

2016-03-01

In this paper, we propose a new algorithm that minimizes the worst expected growth of an epidemic by reducing the size of the largest connected component (LCC) of the underlying contact network. The proposed algorithm is applicable to any level of available resources and, despite the greedy approaches of most immunization strategies, selects nodes simultaneously. In each iteration, the proposed method partitions the LCC into two groups. These are the best candidates for communities in that component, and the available resources are sufficient to separate them. Using Laplacian spectral partitioning, the proposed method performs community detection inference with a time complexity that rivals that of the best previous methods. Experiments show that our method outperforms targeted immunization approaches in both real and synthetic networks.
Sensitivity and specificity of administrative mortality data for identifying prescription opioid–related deaths

PubMed Central

Gladstone, Emilie; Smolina, Kate; Morgan, Steven G.; Fernandes, Kimberly A.; Martins, Diana; Gomes, Tara

2016-01-01

Background: Comprehensive systems for surveilling prescription opioid–related harms provide clear evidence that deaths from prescription opioids have increased dramatically in the United States. However, these harms are not systematically monitored in Canada. In light of a growing public health crisis, accessible, nationwide data sources to examine prescription opioid–related harms in Canada are needed. We sought to examine the performance of 5 algorithms to identify prescription opioid–related deaths from vital statistics data against data abstracted from the Office of the Chief Coroner of Ontario as a gold standard. Methods: We identified all prescription opioid–related deaths from Ontario coroners’ data that occurred between Jan. 31, 2003, and Dec. 31, 2010. We then used 5 different algorithms to identify prescription opioid–related deaths from vital statistics death data in 2010. We selected the algorithm with the highest sensitivity and a positive predictive value of more than 80% as the optimal algorithm for identifying prescription opioid–related deaths. Results: Four of the 5 algorithms had positive predictive values of more than 80%. The algorithm with the highest sensitivity (75%) in 2010 improved slightly in its predictive performance from 2003 to 2010. Interpretation: In the absence of specific systems for monitoring prescription opioid–related deaths in Canada, readily available national vital statistics data can be used to study prescription opioid–related mortality with considerable accuracy. Despite some limitations, these data may facilitate the implementation of national surveillance and monitoring strategies. PMID:26622006
Sensitivity and specificity of administrative mortality data for identifying prescription opioid-related deaths.

PubMed

Gladstone, Emilie; Smolina, Kate; Morgan, Steven G; Fernandes, Kimberly A; Martins, Diana; Gomes, Tara

2016-03-01

Comprehensive systems for surveilling prescription opioid-related harms provide clear evidence that deaths from prescription opioids have increased dramatically in the United States. However, these harms are not systematically monitored in Canada. In light of a growing public health crisis, accessible, nationwide data sources to examine prescription opioid-related harms in Canada are needed. We sought to examine the performance of 5 algorithms to identify prescription opioid-related deaths from vital statistics data against data abstracted from the Office of the Chief Coroner of Ontario as a gold standard. We identified all prescription opioid-related deaths from Ontario coroners' data that occurred between Jan. 31, 2003, and Dec. 31, 2010. We then used 5 different algorithms to identify prescription opioid-related deaths from vital statistics death data in 2010. We selected the algorithm with the highest sensitivity and a positive predictive value of more than 80% as the optimal algorithm for identifying prescription opioid-related deaths. Four of the 5 algorithms had positive predictive values of more than 80%. The algorithm with the highest sensitivity (75%) in 2010 improved slightly in its predictive performance from 2003 to 2010. In the absence of specific systems for monitoring prescription opioid-related deaths in Canada, readily available national vital statistics data can be used to study prescription opioid-related mortality with considerable accuracy. Despite some limitations, these data may facilitate the implementation of national surveillance and monitoring strategies. © 2016 Canadian Medical Association or its licensors.

An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.

PubMed

Zou, Hui; Zou, Zhihong; Wang, Xiaojing

2015-11-12

The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.
78 FR 57639 - Request for Comments on Pediatric Planned Procedure Algorithm

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-19

... Comments on Pediatric Planned Procedure Algorithm AGENCY: Agency for Healthcare Research and Quality (AHRQ), HHS. ACTION: Notice of request for comments on pediatric planned procedure algorithm from the members... Quality (AHRQ) is requesting comments from the public on an algorithm for identifying pediatric planned...
Abbreviation definition identification based on automatic precision estimates.

PubMed

Sohn, Sunghwan; Comeau, Donald C; Kim, Won; Wilbur, W John

2008-09-25

The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation. On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm. We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.
Metabolic Network Modeling of Microbial Communities

PubMed Central

Biggs, Matthew B.; Medlock, Gregory L.; Kolling, Glynis L.

2015-01-01

Genome-scale metabolic network reconstructions and constraint-based analysis are powerful methods that have the potential to make functional predictions about microbial communities. Current use of genome-scale metabolic networks to characterize the metabolic functions of microbial communities includes species compartmentalization, separating species-level and community-level objectives, dynamic analysis, the “enzyme-soup” approach, multi-scale modeling, and others. There are many challenges inherent to the field, including a need for tools that accurately assign high-level omics signals to individual community members, new automated reconstruction methods that rival manual curation, and novel algorithms for integrating omics data and engineering communities. As technologies and modeling frameworks improve, we expect that there will be proportional advances in the fields of ecology, health science, and microbial community engineering. PMID:26109480
An administrative data validation study of the accuracy of algorithms for identifying rheumatoid arthritis: the influence of the reference standard on algorithm performance.

PubMed

Widdifield, Jessica; Bombardier, Claire; Bernatsky, Sasha; Paterson, J Michael; Green, Diane; Young, Jacqueline; Ivers, Noah; Butt, Debra A; Jaakkimainen, R Liisa; Thorne, J Carter; Tu, Karen

2014-06-23

We have previously validated administrative data algorithms to identify patients with rheumatoid arthritis (RA) using rheumatology clinic records as the reference standard. Here we reassessed the accuracy of the algorithms using primary care records as the reference standard. We performed a retrospective chart abstraction study using a random sample of 7500 adult patients under the care of 83 family physicians contributing to the Electronic Medical Record Administrative data Linked Database (EMRALD) in Ontario, Canada. Using physician-reported diagnoses as the reference standard, we computed and compared the sensitivity, specificity, and predictive values for over 100 administrative data algorithms for RA case ascertainment. We identified 69 patients with RA for a lifetime RA prevalence of 0.9%. All algorithms had excellent specificity (>97%). However, sensitivity varied (75-90%) among physician billing algorithms. Despite the low prevalence of RA, most algorithms had adequate positive predictive value (PPV; 51-83%). The algorithm of "[1 hospitalization RA diagnosis code] or [3 physician RA diagnosis codes with ≥1 by a specialist over 2 years]" had a sensitivity of 78% (95% CI 69-88), specificity of 100% (95% CI 100-100), PPV of 78% (95% CI 69-88) and NPV of 100% (95% CI 100-100). Administrative data algorithms for detecting RA patients achieved a high degree of accuracy amongst the general population. However, results varied slightly from our previous report, which can be attributed to differences in the reference standards with respect to disease prevalence, spectrum of disease, and type of comparator group.
Identifying Optimal Measurement Subspace for the Ensemble Kalman Filter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Ning; Huang, Zhenyu; Welch, Greg

2012-05-24

To reduce the computational load of the ensemble Kalman filter while maintaining its efficacy, an optimization algorithm based on the generalized eigenvalue decomposition method is proposed for identifying the most informative measurement subspace. When the number of measurements is large, the proposed algorithm can be used to make an effective tradeoff between computational complexity and estimation accuracy. This algorithm also can be extended to other Kalman filters for measurement subspace selection.
A View from Above Without Leaving the Ground

NASA Technical Reports Server (NTRS)

2004-01-01

In order to deliver accurate geospatial data and imagery to the remote sensing community, NASA is constantly developing new image-processing algorithms while refining existing ones for technical improvement. For 8 years, the NASA Regional Applications Center at Florida International University has served as a test bed for implementing and validating many of these algorithms, helping the Space Program to fulfill its strategic and educational goals in the area of remote sensing. The algorithms in return have helped the NASA Regional Applications Center develop comprehensive semantic database systems for data management, as well as new tools for disseminating geospatial information via the Internet.
Examining urban brownfields through the public health "macroscope".

PubMed

Litt, Jill S; Tran, Nga L; Burke, Thomas A

2002-04-01

Efforts to cope with the legacy of our industrial cities--blight, poverty, environmental degradation, ailing communities--have galvanized action across the public and private sectors to move vacant industrial land, also referred to as brownfields, to productive use; to curb sprawling development outside urban areas; and to reinvigorate urban communities. Such efforts, however, may be proceeding without thorough investigations into the environmental health and safety risks associated with industrial brownfields properties and the needs of affected neighborhoods. We describe an approach to characterize vacant and underused industrial and commercial properties in Southeast Baltimore and the health and well being of communities living near these properties. The screening algorithm developed to score and rank properties in Southeast Baltimore (n= 182) showed that these sites are not benign. The historical data revealed a range of hazardous operations, including metal smelting, oil refining, warehousing, and transportation, as well as paints, plastics, and metals manufacturing. The data also identified hazardous substances linked to these properties, including heavy metals, solvents, polycyclic aromatic hydrocarbons, plasticizers, and insecticides, all of which are suspected or recognized toxicants and many of which are persistent in the environment. The health analysis revealed disparities across Southeast Baltimore communities, including excess deaths from respiratory illness (lung cancer, chronic obstructive pulmonary disease, influenza, and pneumonia), total cancers, and a "leading cause of death" index and a spatial and statistical relationship between environmentally degraded brownfields areas and at-risk communities. Brownfields redevelopment is a key component of our national efforts to address environmental justice and health disparities across urban communities and is critical to urban revitalization. Incorporating public health into brownfields-related cleanup and land-use decisions will increase the odds for successful neighborhood redevelopment and long-term public health benefits.
Examining urban brownfields through the public health "macroscope".

PubMed Central

Litt, Jill S; Tran, Nga L; Burke, Thomas A

2002-01-01

Efforts to cope with the legacy of our industrial cities--blight, poverty, environmental degradation, ailing communities--have galvanized action across the public and private sectors to move vacant industrial land, also referred to as brownfields, to productive use; to curb sprawling development outside urban areas; and to reinvigorate urban communities. Such efforts, however, may be proceeding without thorough investigations into the environmental health and safety risks associated with industrial brownfields properties and the needs of affected neighborhoods. We describe an approach to characterize vacant and underused industrial and commercial properties in Southeast Baltimore and the health and well being of communities living near these properties. The screening algorithm developed to score and rank properties in Southeast Baltimore (n= 182) showed that these sites are not benign. The historical data revealed a range of hazardous operations, including metal smelting, oil refining, warehousing, and transportation, as well as paints, plastics, and metals manufacturing. The data also identified hazardous substances linked to these properties, including heavy metals, solvents, polycyclic aromatic hydrocarbons, plasticizers, and insecticides, all of which are suspected or recognized toxicants and many of which are persistent in the environment. The health analysis revealed disparities across Southeast Baltimore communities, including excess deaths from respiratory illness (lung cancer, chronic obstructive pulmonary disease, influenza, and pneumonia), total cancers, and a "leading cause of death" index and a spatial and statistical relationship between environmentally degraded brownfields areas and at-risk communities. Brownfields redevelopment is a key component of our national efforts to address environmental justice and health disparities across urban communities and is critical to urban revitalization. Incorporating public health into brownfields-related cleanup and land-use decisions will increase the odds for successful neighborhood redevelopment and long-term public health benefits. PMID:11929727
Household surveillance of severe neonatal illness by community health workers in Mirzapur, Bangladesh: coverage and compliance with referral

PubMed Central

Darmstadt, Gary L; Arifeen, Shams El; Choi, Yoonjoung; Bari, Sanwarul; Rahman, Syed M; Mannan, Ishtiaq; Winch, PeterJ; Ahmed, ASM Nawshad Uddin; Seraji, Habibur Rahman; Begum, Nazma; Black, Robert E; Santosham, Mathuram; Baqui, Abdullah H

2010-01-01

Background Effective and scalable community-based strategies are needed for identification and management of serious neonatal illness. Methods As part of a community-based, cluster-randomized controlled trial of the impact of a package of maternal-neonatal health care, community health workers (CHWs) were trained to conduct household surveillance and to identify and refer sick newborns according to a clinical algorithm. Assessments of newborns by CHWs at home were linked to hospital-based assessments by physicians, and factors impacting referral, referral compliance and outcome were evaluated. Results Seventy-three per cent (7310/10 006) of live-born neonates enrolled in the study were assessed by CHWs at least once; 54% were assessed within 2 days of birth, but only 15% were attended at delivery. Among assessments for which referral was recommended, compliance was verified in 54% (495/919). Referrals recommended to young neonates 0–6 days old were 30% less likely to be complied with compared to older neonates. Compliance was positively associated with having very severe disease and selected clinical signs, including respiratory rate ≥70/minute; weak, abnormal or absent cry; lethargic or less than normal movement; and feeding problem. Among 239 neonates who died, only 38% were assessed by a CHW before death. Conclusions Despite rigorous programmatic effort, reaching neonates within the first 2 days after birth remained a challenge, and parental compliance with referral recommendation was limited, particularly among young neonates. To optimize potential impact, community postnatal surveillance must be coupled with skilled attendance at delivery, and/or a worker skilled in recognition of neonatal illness must be placed in close proximity to the community to allow for rapid case management to avert early deaths. PMID:19917652
Epidemic history of hepatitis C virus infection in two remote communities in Nigeria, West Africa.

PubMed

Forbi, Joseph C; Purdy, Michael A; Campo, David S; Vaughan, Gilberto; Dimitrova, Zoya E; Ganova-Raeva, Lilia M; Xia, Guo-Liang; Khudyakov, Yury E

2012-07-01

We investigated the molecular epidemiology and population dynamics of HCV infection among indigenes of two semi-isolated communities in North-Central Nigeria. Despite remoteness and isolation, ~15% of the population had serological or molecular markers of hepatitis C virus (HCV) infection. Phylogenetic analysis of the NS5b sequences obtained from 60 HCV-infected residents showed that HCV variants belonged to genotype 1 (n=51; 85%) and genotype 2 (n=9; 15%). All sequences were unique and intermixed in the phylogenetic tree with HCV sequences from people infected from other West African countries. The high-throughput 454 pyrosequencing of the HCV hypervariable region 1 and an empirical threshold error correction algorithm were used to evaluate intra-host heterogeneity of HCV strains of genotype 1 (n=43) and genotype 2 (n=6) from residents of the communities. Analysis revealed a rare detectable intermixing of HCV intra-host variants among residents. Identification of genetically close HCV variants among all known groups of relatives suggests a common intra-familial HCV transmission in the communities. Applying Bayesian coalescent analysis to the NS5b sequences, the most recent common ancestors for genotype 1 and 2 variants were estimated to have existed 675 and 286 years ago, respectively. Bayesian skyline plots suggest that HCV lineages of both genotypes identified in the Nigerian communities experienced epidemic growth for 200-300 years until the mid-20th century. The data suggest a massive introduction of numerous HCV variants to the communities during the 20th century in the background of a dynamic evolutionary history of the hepatitis C epidemic in Nigeria over the past three centuries.
A systematic review of validated methods for identifying transfusion-related ABO incompatibility reactions using administrative and claims data.

PubMed

Carnahan, Ryan M; Kee, Vicki R

2012-01-01

This paper aimed to systematically review algorithms to identify transfusion-related ABO incompatibility reactions in administrative data, with a focus on studies that have examined the validity of the algorithms. A literature search was conducted using PubMed, Iowa Drug Information Service database, and Embase. A Google Scholar search was also conducted because of the difficulty identifying relevant studies. Reviews were conducted by two investigators to identify studies using data sources from the USA or Canada because these data sources were most likely to reflect the coding practices of Mini-Sentinel data sources. One study was found that validated International Classification of Diseases (ICD-9-CM) codes representing transfusion reactions. None of these cases were ABO incompatibility reactions. Several studies consistently used ICD-9-CM code 999.6, which represents ABO incompatibility reactions, and a technical report identified the ICD-10 code for these reactions. One study included the E-code E8760 for mismatched blood in transfusion in the algorithm. Another study reported finding no ABO incompatibility reaction codes in the Healthcare Cost and Utilization Project Nationwide Inpatient Sample database, which contains data of 2.23 million patients who received transfusions, raising questions about the sensitivity of administrative data for identifying such reactions. Two studies reported perfect specificity, with sensitivity ranging from 21% to 83%, for the code identifying allogeneic red blood cell transfusions in hospitalized patients. There is no information to assess the validity of algorithms to identify transfusion-related ABO incompatibility reactions. Further information on the validity of algorithms to identify transfusions would also be useful. Copyright © 2012 John Wiley & Sons, Ltd.
Dereplication, Aggregation and Scoring Tool (DAS Tool) v1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

SIEBER, CHRISTIAN

Communities of uncultivated microbes are critical to ecosystem function and microorganism health, and a key objective of metagenomic studies is to analyze organism-specific metabolic pathways and reconstruct community interaction networks. This requires accurate assignment of genes to genomes, yet existing binning methods often fail to predict a reasonable number of genomes and report many bins of low quality and completeness. Furthermore, the performance of existing algorithms varies between samples and biotypes. Here, we present a dereplication, aggregation and scoring strategy, DAS Tool, that combines the strengths of a flexible set of established binning algorithms. DAS Tools applied to a constructedmore » community generated more accurate bins than any automated method. Further, when applied to samples of different complexity, including soil, natural oil seeps, and the human gut, DAS Tool recovered substantially more near-complete genomes than any single binning method alone. Included were three genomes from a novel lineage . The ability to reconstruct many near-complete genomes from metagenomics data will greatly advance genome-centric analyses of ecosystems.« less
NASA Ocean Altimeter Pathfinder Project. Report 2; Data Set Validation

NASA Technical Reports Server (NTRS)

Koblinsky, C. J.; Ray, Richard D.; Beckley, Brian D.; Bremmer, Anita; Tsaoussi, Lucia S.; Wang, Yan-Ming

1999-01-01

The NOAA/NASA Pathfinder program was created by the Earth Observing System (EOS) Program Office to determine how existing satellite-based data sets can be processed and used to study global change. The data sets are designed to be long time-series data processed with stable calibration and community consensus algorithms to better assist the research community. The Ocean Altimeter Pathfinder Project involves the reprocessing of all altimeter observations with a consistent set of improved algorithms, based on the results from TOPEX/POSEIDON (T/P), into easy-to-use data sets for the oceanographic community for climate research. Details are currently presented in two technical reports: Report# 1: Data Processing Handbook Report #2: Data Set Validation This report describes the validation of the data sets against a global network of high quality tide gauge measurements and provides an estimate of the error budget. The first report describes the processing schemes used to produce the geodetic consistent data set comprised of SEASAT, GEOSAT, ERS-1, TOPEX/ POSEIDON, and ERS-2 satellite observations.
Management, Analysis, and Visualization of Experimental and Observational Data – The Convergence of Data and Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, E. Wes; Greenwald, Martin; Kleese van Dam, Kerstin

Scientific user facilities—particle accelerators, telescopes, colliders, supercomputers, light sources, sequencing facilities, and more—operated by the U.S. Department of Energy (DOE) Office of Science (SC) generate ever increasing volumes of data at unprecedented rates from experiments, observations, and simulations. At the same time there is a growing community of experimentalists that require real-time data analysis feedback, to enable them to steer their complex experimental instruments to optimized scientific outcomes and new discoveries. Recent efforts in DOE-SC have focused on articulating the data-centric challenges and opportunities facing these science communities. Key challenges include difficulties coping with data size, rate, and complexity inmore » the context of both real-time and post-experiment data analysis and interpretation. Solutions will require algorithmic and mathematical advances, as well as hardware and software infrastructures that adequately support data-intensive scientific workloads. This paper presents the summary findings of a workshop held by DOE-SC in September 2015, convened to identify the major challenges and the research that is needed to meet those challenges.« less
Management, Analysis, and Visualization of Experimental and Observational Data -- The Convergence of Data and Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, E. Wes; Greenwald, Martin; Kleese van Dam, Kersten

Scientific user facilities---particle accelerators, telescopes, colliders, supercomputers, light sources, sequencing facilities, and more---operated by the U.S. Department of Energy (DOE) Office of Science (SC) generate ever increasing volumes of data at unprecedented rates from experiments, observations, and simulations. At the same time there is a growing community of experimentalists that require real-time data analysis feedback, to enable them to steer their complex experimental instruments to optimized scientific outcomes and new discoveries. Recent efforts in DOE-SC have focused on articulating the data-centric challenges and opportunities facing these science communities. Key challenges include difficulties coping with data size, rate, and complexity inmore » the context of both real-time and post-experiment data analysis and interpretation. Solutions will require algorithmic and mathematical advances, as well as hardware and software infrastructures that adequately support data-intensive scientific workloads. This paper presents the summary findings of a workshop held by DOE-SC in September 2015, convened to identify the major challenges and the research that is needed to meet those challenges.« less
CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks

NASA Astrophysics Data System (ADS)

Franke, R.

2016-11-01

In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.
GDPC: Gravitation-based Density Peaks Clustering algorithm

NASA Astrophysics Data System (ADS)

Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin

2018-07-01

The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.
Examining Thematic Similarity, Difference, and Membership in Three Online Mental Health Communities from Reddit: A Text Mining and Visualization Approach.

PubMed

Park, Albert; Conway, Mike; Chen, Annie T

2018-01-01

Social media, including online health communities, have become popular platforms for individuals to discuss health challenges and exchange social support with others. These platforms can provide support for individuals who are concerned about social stigma and discrimination associated with their illness. Although mental health conditions can share similar symptoms and even co-occur, the extent to which discussion topics in online mental health communities are similar, different, or overlapping is unknown. Discovering the topical similarities and differences could potentially inform the design of related mental health communities and patient education programs. This study employs text mining, qualitative analysis, and visualization techniques to compare discussion topics in publicly accessible online mental health communities for three conditions: Anxiety, Depression and Post-Traumatic Stress Disorder. First, online discussion content for the three conditions was collected from three Reddit communities (r/Anxiety, r/Depression, and r/PTSD). Second, content was pre-processed, and then clustered using the k -means algorithm to identify themes that were commonly discussed by members. Third, we qualitatively examined the common themes to better understand them, as well as their similarities and differences. Fourth, we employed multiple visualization techniques to form a deeper understanding of the relationships among the identified themes for the three mental health conditions. The three mental health communities shared four themes: sharing of positive emotion, gratitude for receiving emotional support, and sleep- and work-related issues. Depression clusters tended to focus on self-expressed contextual aspects of depression, whereas the Anxiety Disorders and Post-Traumatic Stress Disorder clusters addressed more treatment- and medication-related issues. Visualizations showed that discussion topics from the Anxiety Disorders and Post-Traumatic Stress Disorder subreddits shared more similarities to one another than to the depression subreddit. We observed that the members of the three communities shared several overlapping concerns (i.e., sleep- and work-related problems) and discussion patterns (i.e., sharing of positive emotion and showing gratitude for receiving emotional support). We also highlighted that the discussions from the r/Anxiety and r/PTSD communities were more similar to one another than to discussions from the r/Depression community. The r/Anxiety and r/PTSD subreddit members are more likely to be individuals whose experiences with a condition are long-term, and who are interested in treatments and medications. The r/Depression subreddit members may be a comparatively diffuse group, many of whom are dealing with transient issues that cause depressed mood. The findings from this study could be used to inform the design of online mental health communities and patient education programs for these conditions. Moreover, we suggest that researchers employ multiple methods to fully understand the subtle differences when comparing similar discussions from online health communities.
A genetic algorithm based global search strategy for population pharmacokinetic/pharmacodynamic model selection

PubMed Central

Sale, Mark; Sherer, Eric A

2015-01-01

The current algorithm for selecting a population pharmacokinetic/pharmacodynamic model is based on the well-established forward addition/backward elimination method. A central strength of this approach is the opportunity for a modeller to continuously examine the data and postulate new hypotheses to explain observed biases. This algorithm has served the modelling community well, but the model selection process has essentially remained unchanged for the last 30 years. During this time, more robust approaches to model selection have been made feasible by new technology and dramatic increases in computation speed. We review these methods, with emphasis on genetic algorithm approaches and discuss the role these methods may play in population pharmacokinetic/pharmacodynamic model selection. PMID:23772792

Fast stochastic algorithm for simulating evolutionary population dynamics

NASA Astrophysics Data System (ADS)

Tsimring, Lev; Hasty, Jeff; Mather, William

2012-02-01

Evolution and co-evolution of ecological communities are stochastic processes often characterized by vastly different rates of reproduction and mutation and a coexistence of very large and very small sub-populations of co-evolving species. This creates serious difficulties for accurate statistical modeling of evolutionary dynamics. In this talk, we introduce a new exact algorithm for fast fully stochastic simulations of birth/death/mutation processes. It produces a significant speedup compared to the direct stochastic simulation algorithm in a typical case when the total population size is large and the mutation rates are much smaller than birth/death rates. We illustrate the performance of the algorithm on several representative examples: evolution on a smooth fitness landscape, NK model, and stochastic predator-prey system.
Bibliography of spatial interferometry in optical astronomy

NASA Technical Reports Server (NTRS)

Gezari, Daniel Y.; Roddier, Francois; Roddier, Claude

1990-01-01

The Bibliography of Spatial Interferometry in Optical Astronomy is a guide to the published literature in applications of spatial interferometry techniques to astronomical observations, theory and instrumentation at visible and infrared wavelengths. The key words spatial and optical define the scope of this discipline, distinguishing it from spatial interferometry at radio wavelengths, interferometry in the frequency domain applied to spectroscopy, or more general electro-optics theoretical and laboratory research. The main bibliography is a listing of all technical articles published in the international scientific literature and presented at the major international meetings and workshops attended by the spatial interferometry community. Section B summarizes publications dealing with the basic theoretical concepts and algorithms proposed and applied to optical spatial interferometry and imaging through a turbulent atmosphere. The section on experimental techniques is divided into twelve categories, representing the most clearly identified major areas of experimental research work. Section D, Observations, identifies publications dealing specifically with observations of astronomical sources, in which optical spatial interferometry techniques have been applied.
Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation.

PubMed

Kayaalp, Mehmet; Browne, Allen C; Sagan, Pamela; McGee, Tyne; McDonald, Clement J

2015-01-01

The Privacy Rule of Health Insurance Portability and Accountability Act (HIPAA) requires that clinical documents be stripped of personally identifying information before they can be released to researchers and others. We have been manually annotating clinical text since 2008 in order to test and evaluate an algorithmic clinical text de-identification tool, NLM Scrubber, which we have been developing in parallel. Although HIPAA provides some guidance about what must be de-identified, translating those guidelines into practice is not as straightforward, especially when one deals with free text. As a result we have changed our manual annotation labels and methods six times. This paper explains why we have made those annotation choices, which have been evolved throughout seven years of practice on this field. The aim of this paper is to start a community discussion towards developing standards for clinical text annotation with the end goal of studying and comparing clinical text de-identification systems more accurately.
Algorithm for automatic forced spirometry quality assessment: technological developments.

PubMed

Melia, Umberto; Burgos, Felip; Vallverdú, Montserrat; Velickovski, Filip; Lluch-Ariet, Magí; Roca, Josep; Caminal, Pere

2014-01-01

We hypothesized that the implementation of automatic real-time assessment of quality of forced spirometry (FS) may significantly enhance the potential for extensive deployment of a FS program in the community. Recent studies have demonstrated that the application of quality criteria defined by the ATS/ERS (American Thoracic Society/European Respiratory Society) in commercially available equipment with automatic quality assessment can be markedly improved. To this end, an algorithm for assessing quality of FS automatically was reported. The current research describes the mathematical developments of the algorithm. An innovative analysis of the shape of the spirometric curve, adding 23 new metrics to the traditional 4 recommended by ATS/ERS, was done. The algorithm was created through a two-step iterative process including: (1) an initial version using the standard FS curves recommended by the ATS; and, (2) a refined version using curves from patients. In each of these steps the results were assessed against one expert's opinion. Finally, an independent set of FS curves from 291 patients was used for validation purposes. The novel mathematical approach to characterize the FS curves led to appropriate FS classification with high specificity (95%) and sensitivity (96%). The results constitute the basis for a successful transfer of FS testing to non-specialized professionals in the community.
Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

PubMed

Hoffman, Sarah R; Vines, Anissa I; Halladay, Jacqueline R; Pfaff, Emily; Schiff, Lauren; Westreich, Daniel; Sundaresan, Aditi; Johnson, La-Shell; Nicholson, Wanda K

2018-06-01

Women with symptomatic uterine fibroids can report a myriad of symptoms, including pain, bleeding, infertility, and psychosocial sequelae. Optimizing fibroid research requires the ability to enroll populations of women with image-confirmed symptomatic uterine fibroids. Our objective was to develop an electronic health record-based algorithm to identify women with symptomatic uterine fibroids for a comparative effectiveness study of medical or surgical treatments on quality-of-life measures. Using an iterative process and text-mining techniques, an effective computable phenotype algorithm, composed of demographics, and clinical and laboratory characteristics, was developed with reasonable performance. Such algorithms provide a feasible, efficient way to identify populations of women with symptomatic uterine fibroids for the conduct of large traditional or pragmatic trials and observational comparative effectiveness studies. Symptomatic uterine fibroids, due to menorrhagia, pelvic pain, bulk symptoms, or infertility, are a source of substantial morbidity for reproductive-age women. Comparing Treatment Options for Uterine Fibroids is a multisite registry study to compare the effectiveness of hormonal or surgical fibroid treatments on women's perceptions of their quality of life. Electronic health record-based algorithms are able to identify large numbers of women with fibroids, but additional work is needed to develop electronic health record algorithms that can identify women with symptomatic fibroids to optimize fibroid research. We sought to develop an efficient electronic health record-based algorithm that can identify women with symptomatic uterine fibroids in a large health care system for recruitment into large-scale observational and interventional research in fibroid management. We developed and assessed the accuracy of 3 algorithms to identify patients with symptomatic fibroids using an iterative approach. The data source was the Carolina Data Warehouse for Health, a repository for the health system's electronic health record data. In addition to International Classification of Diseases, Ninth Revision diagnosis and procedure codes and clinical characteristics, text data-mining software was used to derive information from imaging reports to confirm the presence of uterine fibroids. Results of each algorithm were compared with expert manual review to calculate the positive predictive values for each algorithm. Algorithm 1 was composed of the following criteria: (1) age 18-54 years; (2) either ≥1 International Classification of Diseases, Ninth Revision diagnosis codes for uterine fibroids or mention of fibroids using text-mined key words in imaging records or documents; and (3) no International Classification of Diseases, Ninth Revision or Current Procedural Terminology codes for hysterectomy and no reported history of hysterectomy. The positive predictive value was 47% (95% confidence interval 39-56%). Algorithm 2 required ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids and positive text-mined key words and had a positive predictive value of 65% (95% confidence interval 50-79%). In algorithm 3, further refinements included ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids on separate outpatient visit dates, the exclusion of women who had a positive pregnancy test within 3 months of their fibroid-related visit, and exclusion of incidentally detected fibroids during prenatal or emergency department visits. Algorithm 3 achieved a positive predictive value of 76% (95% confidence interval 71-81%). An electronic health record-based algorithm is capable of identifying cases of symptomatic uterine fibroids with moderate positive predictive value and may be an efficient approach for large-scale study recruitment. Copyright © 2018 Elsevier Inc. All rights reserved.
Blooming Trees: Substructures and Surrounding Groups of Galaxy Clusters

NASA Astrophysics Data System (ADS)

Yu, Heng; Diaferio, Antonaldo; Serra, Ana Laura; Baldi, Marco

2018-06-01

We develop the Blooming Tree Algorithm, a new technique that uses spectroscopic redshift data alone to identify the substructures and the surrounding groups of galaxy clusters, along with their member galaxies. Based on the estimated binding energy of galaxy pairs, the algorithm builds a binary tree that hierarchically arranges all of the galaxies in the field of view. The algorithm searches for buds, corresponding to gravitational potential minima on the binary tree branches; for each bud, the algorithm combines the number of galaxies, their velocity dispersion, and their average pairwise distance into a parameter that discriminates between the buds that do not correspond to any substructure or group, and thus eventually die, and the buds that correspond to substructures and groups, and thus bloom into the identified structures. We test our new algorithm with a sample of 300 mock redshift surveys of clusters in different dynamical states; the clusters are extracted from a large cosmological N-body simulation of a ΛCDM model. We limit our analysis to substructures and surrounding groups identified in the simulation with mass larger than 1013 h ‑1 M ⊙. With mock redshift surveys with 200 galaxies within 6 h ‑1 Mpc from the cluster center, the technique recovers 80% of the real substructures and 60% of the surrounding groups; in 57% of the identified structures, at least 60% of the member galaxies of the substructures and groups belong to the same real structure. These results improve by roughly a factor of two the performance of the best substructure identification algorithm currently available, the σ plateau algorithm, and suggest that our Blooming Tree Algorithm can be an invaluable tool for detecting substructures of galaxy clusters and investigating their complex dynamics.
An Evaluation of Algorithms for Identifying Metastatic Breast, Lung, or Colorectal Cancer in Administrative Claims Data.

PubMed

Whyte, Joanna L; Engel-Nitz, Nicole M; Teitelbaum, April; Gomez Rey, Gabriel; Kallich, Joel D

2015-07-01

Administrative health care claims data are used for epidemiologic, health services, and outcomes cancer research and thus play a significant role in policy. Cancer stage, which is often a major driver of cost and clinical outcomes, is not typically included in claims data. Evaluate algorithms used in a dataset of cancer patients to identify patients with metastatic breast (BC), lung (LC), or colorectal (CRC) cancer using claims data. Clinical data on BC, LC, or CRC patients (between January 1, 2007 and March 31, 2010) were linked to a health care claims database. Inclusion required health plan enrollment ≥3 months before initial cancer diagnosis date. Algorithms were used in the claims database to identify patients' disease status, which was compared with physician-reported metastases. Generic and tumor-specific algorithms were evaluated using ICD-9 codes, varying diagnosis time frames, and including/excluding other tumors. Positive and negative predictive values, sensitivity, and specificity were assessed. The linked databases included 14,480 patients; of whom, 32%, 17%, and 14.2% had metastatic BC, LC, and CRC, respectively, at diagnosis and met inclusion criteria. Nontumor-specific algorithms had lower specificity than tumor-specific algorithms. Tumor-specific algorithms' sensitivity and specificity were 53% and 99% for BC, 55% and 85% for LC, and 59% and 98% for CRC, respectively. Algorithms to distinguish metastatic BC, LC, and CRC from locally advanced disease should use tumor-specific primary cancer codes with 2 claims for the specific primary cancer >30-42 days apart to reduce misclassification. These performed best overall in specificity, positive predictive values, and overall accuracy to identify metastatic cancer in a health care claims database.
A longitudinal study of adult-onset asthma incidence among HMO members

PubMed Central

Sama, Susan R; Hunt, Phillip R; Cirillo, CIH Priscilla; Marx, Arminda; Rosiello, Richard A; Henneberger, Paul K; Milton, Donald K

2003-01-01

Background HMO databases offer an opportunity for community based epidemiologic studies of asthma incidence, etiology and treatment. The incidence of asthma in HMO populations and the utility of HMO data, including use of computerized algorithms and manual review of medical charts for determining etiologic factors has not been fully explored. Methods We identified adult-onset asthma, using computerized record searches in a New England HMO. Monthly, our software applied exclusion and inclusion criteria to identify an "at-risk" population and "potential cases". Electronic and paper medical records from the past year were then reviewed for each potential case. Persons with other respiratory diseases or insignificant treatment for asthma were excluded. Confirmed adult-onset asthma (AOA) cases were defined as those potential cases with either new-onset asthma or reactivated mild intermittent asthma that had been quiescent for at least one year. We validated the methods by reviewing charts of selected subjects rejected by the algorithm. Results The algorithm was 93 to 99.3% sensitive and 99.6% specific. Sixty-three percent (n = 469) of potential cases were confirmed as AOA. Two thirds of confirmed cases were women with an average age of 34.8 (SD 11.8), and 45% had no evidence of previous asthma diagnosis. The annualized monthly rate of AOA ranged from 4.1 to 11.4 per 1000 at-risk members. Physicians most commonly attribute asthma to infection (59%) and allergy (14%). New-onset cases were more likely attributed to infection, while reactivated cases were more associated with allergies. Medical charts included a discussion of work exposures in relation to asthma in only 32 (7%) cases. Twenty-three of these (72%) indicated there was an association between asthma and workplace exposures for an overall rate of work-related asthma of 4.9%. Conclusion Computerized HMO records can be successfully used to identify AOA. Manual review of these records is important to confirm case status and is useful in evaluation of provider consideration of etiologies. We demonstrated that clinicians attribute most AOA to infection and tend to ignore the contribution of environmental and occupational exposures. PMID:12952547
A longitudinal study of adult-onset asthma incidence among HMO members.

PubMed

Sama, Susan R; Hunt, Phillip R; Cirillo, C I H Priscilla; Marx, Arminda; Rosiello, Richard A; Henneberger, Paul K; Milton, Donald K

2003-08-07

HMO databases offer an opportunity for community based epidemiologic studies of asthma incidence, etiology and treatment. The incidence of asthma in HMO populations and the utility of HMO data, including use of computerized algorithms and manual review of medical charts for determining etiologic factors has not been fully explored. We identified adult-onset asthma, using computerized record searches in a New England HMO. Monthly, our software applied exclusion and inclusion criteria to identify an "at-risk" population and "potential cases". Electronic and paper medical records from the past year were then reviewed for each potential case. Persons with other respiratory diseases or insignificant treatment for asthma were excluded. Confirmed adult-onset asthma (AOA) cases were defined as those potential cases with either new-onset asthma or reactivated mild intermittent asthma that had been quiescent for at least one year. We validated the methods by reviewing charts of selected subjects rejected by the algorithm. The algorithm was 93 to 99.3% sensitive and 99.6% specific. Sixty-three percent (n = 469) of potential cases were confirmed as AOA. Two thirds of confirmed cases were women with an average age of 34.8 (SD 11.8), and 45% had no evidence of previous asthma diagnosis. The annualized monthly rate of AOA ranged from 4.1 to 11.4 per 1000 at-risk members. Physicians most commonly attribute asthma to infection (59%) and allergy (14%). New-onset cases were more likely attributed to infection, while reactivated cases were more associated with allergies. Medical charts included a discussion of work exposures in relation to asthma in only 32 (7%) cases. Twenty-three of these (72%) indicated there was an association between asthma and workplace exposures for an overall rate of work-related asthma of 4.9%. Computerized HMO records can be successfully used to identify AOA. Manual review of these records is important to confirm case status and is useful in evaluation of provider consideration of etiologies. We demonstrated that clinicians attribute most AOA to infection and tend to ignore the contribution of environmental and occupational exposures.
ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus.

PubMed

Afzal, Zubair; Pons, Ewoud; Kang, Ning; Sturkenboom, Miriam C J M; Schuemie, Martijn J; Kors, Jan A

2014-11-29

In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module. The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
XTALOPT: An open-source evolutionary algorithm for crystal structure prediction

NASA Astrophysics Data System (ADS)

Lonie, David C.; Zurek, Eva

2011-02-01

The implementation and testing of XTALOPT, an evolutionary algorithm for crystal structure prediction, is outlined. We present our new periodic displacement (ripple) operator which is ideally suited to extended systems. It is demonstrated that hybrid operators, which combine two pure operators, reduce the number of duplicate structures in the search. This allows for better exploration of the potential energy surface of the system in question, while simultaneously zooming in on the most promising regions. A continuous workflow, which makes better use of computational resources as compared to traditional generation based algorithms, is employed. Various parameters in XTALOPT are optimized using a novel benchmarking scheme. XTALOPT is available under the GNU Public License, has been interfaced with various codes commonly used to study extended systems, and has an easy to use, intuitive graphical interface. Program summaryProgram title:XTALOPT Catalogue identifier: AEGX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v2.1 or later [1] No. of lines in distributed program, including test data, etc.: 36 849 No. of bytes in distributed program, including test data, etc.: 1 149 399 Distribution format: tar.gz Programming language: C++ Computer: PCs, workstations, or clusters Operating system: Linux Classification: 7.7 External routines: QT [2], OpenBabel [3], AVOGADRO [4], SPGLIB [8] and one of: VASP [5], PWSCF [6], GULP [7]. Nature of problem: Predicting the crystal structure of a system from its stoichiometry alone remains a grand challenge in computational materials science, chemistry, and physics. Solution method: Evolutionary algorithms are stochastic search techniques which use concepts from biological evolution in order to locate the global minimum on their potential energy surface. Our evolutionary algorithm, XTALOPT, is freely available to the scientific community for use and collaboration under the GNU Public License. Running time: User dependent. The program runs until stopped by the user.
An ant colony optimization based algorithm for identifying gene regulatory elements.

PubMed

Liu, Wei; Chen, Hanwu; Chen, Ling

2013-08-01

It is one of the most important tasks in bioinformatics to identify the regulatory elements in gene sequences. Most of the existing algorithms for identifying regulatory elements are inclined to converge into a local optimum, and have high time complexity. Ant Colony Optimization (ACO) is a meta-heuristic method based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of real ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper designs and implements an ACO based algorithm named ACRI (ant-colony-regulatory-identification) for identifying all possible binding sites of transcription factor from the upstream of co-expressed genes. To accelerate the ants' searching process, a strategy of local optimization is presented to adjust the ants' start positions on the searched sequences. By exploiting the powerful optimization ability of ACO, the algorithm ACRI can not only improve precision of the results, but also achieve a very high speed. Experimental results on real world datasets show that ACRI can outperform other traditional algorithms in the respects of speed and quality of solutions. Copyright © 2013 Elsevier Ltd. All rights reserved.
A systematic review of validated methods for identifying acute respiratory failure using administrative and claims data.

PubMed

Jones, Natalie; Schneider, Gary; Kachroo, Sumesh; Rotella, Philip; Avetisyan, Ruzan; Reynolds, Matthew W

2012-01-01

The Food and Drug Administration's (FDA) Mini-Sentinel pilot program initially aims to conduct active surveillance to refine safety signals that emerge for marketed medical products. A key facet of this surveillance is to develop and understand the validity of algorithms for identifying health outcomes of interest (HOIs) from administrative and claims data. This paper summarizes the process and findings of the algorithm review of acute respiratory failure (ARF). PubMed and Iowa Drug Information Service searches were conducted to identify citations applicable to the anaphylaxis HOI. Level 1 abstract reviews and Level 2 full-text reviews were conducted to find articles using administrative and claims data to identify ARF, including validation estimates of the coding algorithms. Our search revealed a deficiency of literature focusing on ARF algorithms and validation estimates. Only two studies provided codes for ARF, each using related yet different ICD-9 codes (i.e., ICD-9 codes 518.8, "other diseases of lung," and 518.81, "acute respiratory failure"). Neither study provided validation estimates. Research needs to be conducted on designing validation studies to test ARF algorithms and estimating their predictive power, sensitivity, and specificity. Copyright © 2012 John Wiley & Sons, Ltd.
Seeking out SARI: an automated search of electronic health records.

PubMed

O'Horo, John C; Dziadzko, Mikhail; Sakusic, Amra; Ali, Rashid; Sohail, M Rizwan; Kor, Daryl J; Gajic, Ognjen

2018-06-01

The definition of severe acute respiratory infection (SARI) - a respiratory illness with fever and cough, occurring within the past 10 days and requiring hospital admission - has not been evaluated for critically ill patients. Using integrated electronic health records data, we developed an automated search algorithm to identify SARI cases in a large cohort of critical care patients and evaluate patient outcomes. We conducted a retrospective cohort study of all admissions to a medical intensive care unit from August 2009 through March 2016. Subsets were randomly selected for deriving and validating a search algorithm, which was compared with temporal trends in laboratory-confirmed influenza to ensure that SARI was correlated with influenza. The algorithm was applied to the cohort to identify clinical differences for patients with and without SARI. For identifying SARI, the algorithm (sensitivity, 86.9%; specificity, 95.6%) outperformed billing-based searching (sensitivity, 73.8%; specificity, 78.8%). Automated searching correlated with peaks in laboratory-confirmed influenza. Adjusted for severity of illness, SARI was associated with more hospital, intensive care unit and ventilator days but not with death or dismissal to home. The search algorithm accurately identified SARI for epidemiologic study and surveillance.
Generalized ocean color inversion model for retrieving marine inherent optical properties.

PubMed

Werdell, P Jeremy; Franz, Bryan A; Bailey, Sean W; Feldman, Gene C; Boss, Emmanuel; Brando, Vittorio E; Dowell, Mark; Hirata, Takafumi; Lavender, Samantha J; Lee, ZhongPing; Loisel, Hubert; Maritorena, Stéphane; Mélin, Fréderic; Moore, Timothy S; Smyth, Timothy J; Antoine, David; Devred, Emmanuel; d'Andon, Odile Hembise Fanton; Mangin, Antoine

2013-04-01

Ocean color measured from satellites provides daily, global estimates of marine inherent optical properties (IOPs). Semi-analytical algorithms (SAAs) provide one mechanism for inverting the color of the water observed by the satellite into IOPs. While numerous SAAs exist, most are similarly constructed and few are appropriately parameterized for all water masses for all seasons. To initiate community-wide discussion of these limitations, NASA organized two workshops that deconstructed SAAs to identify similarities and uniqueness and to progress toward consensus on a unified SAA. This effort resulted in the development of the generalized IOP (GIOP) model software that allows for the construction of different SAAs at runtime by selection from an assortment of model parameterizations. As such, GIOP permits isolation and evaluation of specific modeling assumptions, construction of SAAs, development of regionally tuned SAAs, and execution of ensemble inversion modeling. Working groups associated with the workshops proposed a preliminary default configuration for GIOP (GIOP-DC), with alternative model parameterizations and features defined for subsequent evaluation. In this paper, we: (1) describe the theoretical basis of GIOP; (2) present GIOP-DC and verify its comparable performance to other popular SAAs using both in situ and synthetic data sets; and, (3) quantify the sensitivities of their output to their parameterization. We use the latter to develop a hierarchical sensitivity of SAAs to various model parameterizations, to identify components of SAAs that merit focus in future research, and to provide material for discussion on algorithm uncertainties and future emsemble applications.
Generalized Ocean Color Inversion Model for Retrieving Marine Inherent Optical Properties

NASA Technical Reports Server (NTRS)

Werdell, P. Jeremy; Franz, Bryan A.; Bailey, Sean W.; Feldman, Gene C.; Boss, Emmanuel; Brando, Vittorio E.; Dowell, Mark; Hirata, Takafumi; Lavender, Samantha J.; Lee, ZhongPing;

2013-01-01

Ocean color measured from satellites provides daily, global estimates of marine inherent optical properties (IOPs). Semi-analytical algorithms (SAAs) provide one mechanism for inverting the color of the water observed by the satellite into IOPs. While numerous SAAs exist, most are similarly constructed and few are appropriately parameterized for all water masses for all seasons. To initiate community-wide discussion of these limitations, NASA organized two workshops that deconstructed SAAs to identify similarities and uniqueness and to progress toward consensus on a unified SAA. This effort resulted in the development of the generalized IOP (GIOP) model software that allows for the construction of different SAAs at runtime by selection from an assortment of model parameterizations. As such, GIOP permits isolation and evaluation of specific modeling assumptions, construction of SAAs, development of regionally tuned SAAs, and execution of ensemble inversion modeling. Working groups associated with the workshops proposed a preliminary default configuration for GIOP (GIOP-DC), with alternative model parameterizations and features defined for subsequent evaluation. In this paper, we: (1) describe the theoretical basis of GIOP; (2) present GIOP-DC and verify its comparable performance to other popular SAAs using both in situ and synthetic data sets; and, (3) quantify the sensitivities of their output to their parameterization. We use the latter to develop a hierarchical sensitivity of SAAs to various model parameterizations, to identify components of SAAs that merit focus in future research, and to provide material for discussion on algorithm uncertainties and future ensemble applications.

Comparison of Two Sepsis Recognition Methods in a Pediatric Emergency Department

PubMed Central

Balamuth, Fran; Alpern, Elizabeth R.; Grundmeier, Robert W.; Chilutti, Marianne; Weiss, Scott L.; Fitzgerald, Julie C.; Hayes, Katie; Bilker, Warren; Lautenbach, Ebbing

2015-01-01

Objectives To compare the effectiveness of physician judgment and an electronic algorithmic alert to identify pediatric patients with severe sepsis/septic shock in a pediatric emergency department (ED). Methods This was an observational cohort study of patients older than 56 days with fever or hypothermia. All patients were evaluated for potential sepsis in real time by the ED clinical team. An electronic algorithmic alert was retrospectively applied to identify patients with potential sepsis independent of physician judgment. The primary outcome was the proportion of patients correctly identified with severe sepsis/septic shock defined by consensus criteria. Test characteristics were determined and receiver operating characteristic (ROC) curves were compared. Results Of 19,524 eligible patient visits, 88 patients developed consensus-confirmed severe sepsis or septic shock. Physician judgment identified 159, and the algorithmic alert identified 3,301 patients with potential sepsis. Physician judgment had sensitivity of 72.7% (95% CI = 72.1% to 73.4%) and specificity 99.5% (95% CI = 99.4% to 99.6%); the algorithmic alert had sensitivity 92.1% (95% CI = 91.7% to 92.4%), and specificity 83.4% (95% CI = 82.9% to 83.9%) for severe sepsis/septic shock. There was no significant difference in the area under the ROC curve for physician judgment (0.86, 95% CI = 0.81 to 0.91) or the algorithm (0.88, 95% CI = 0.85 to 0.91; p = 0.54). A combination method using either positive physician judgment or an algorithmic alert improved sensitivity to 96.6% and specificity to 83.3%. A sequential approach, in which positive identification by the algorithmic alert was then confirmed by physician judgment, achieved 68.2% sensitivity and 99.6% specificity. Positive and negative predictive values for physician judgment vs. algorithmic alert were 40.3% vs. 2.5% and 99.88 % vs. 99.96%, respectively. Conclusions The electronic algorithmic alert was more sensitive but less specific than physician judgment for recognition of pediatric severe sepsis and septic shock. These findings can help to guide institutions in selecting pediatric sepsis recognition methods based on institutional needs and priorities. PMID:26474032
Bio-ALIRT biosurveillance detection algorithm evaluation.

PubMed

Siegrist, David; Pavlin, J

2004-09-24

Early detection of disease outbreaks by a medical biosurveillance system relies on two major components: 1) the contribution of early and reliable data sources and 2) the sensitivity, specificity, and timeliness of biosurveillance detection algorithms. This paper describes an effort to assess leading detection algorithms by arranging a common challenge problem and providing a common data set. The objectives of this study were to determine whether automated detection algorithms can reliably and quickly identify the onset of natural disease outbreaks that are surrogates for possible terrorist pathogen releases, and do so at acceptable false-alert rates (e.g., once every 2-6 weeks). Historic de-identified data were obtained from five metropolitan areas over 23 months; these data included International Classification of Diseases, Ninth Revision (ICD-9) codes related to respiratory and gastrointestinal illness syndromes. An outbreak detection group identified and labeled two natural disease outbreaks in these data and provided them to analysts for training of detection algorithms. All outbreaks in the remaining test data were identified but not revealed to the detection groups until after their analyses. The algorithms established a probability of outbreak for each day's counts. The probability of outbreak was assessed as an "actual" alert for different false-alert rates. The best algorithms were able to detect all of the outbreaks at false-alert rates of one every 2-6 weeks. They were often able to detect for the same day human investigators had identified as the true start of the outbreak. Because minimal data exists for an actual biologic attack, determining how quickly an algorithm might detect such an attack is difficult. However, application of these algorithms in combination with other data-analysis methods to historic outbreak data indicates that biosurveillance techniques for analyzing syndrome counts can rapidly detect seasonal respiratory and gastrointestinal illness outbreaks. Further research is needed to assess the value of electronic data sources for predictive detection. In addition, simulations need to be developed and implemented to better characterize the size and type of biologic attack that can be detected by current methods by challenging them under different projected operational conditions.
Comparison of Two Sepsis Recognition Methods in a Pediatric Emergency Department.

PubMed

Balamuth, Fran; Alpern, Elizabeth R; Grundmeier, Robert W; Chilutti, Marianne; Weiss, Scott L; Fitzgerald, Julie C; Hayes, Katie; Bilker, Warren; Lautenbach, Ebbing

2015-11-01

The objective was to compare the effectiveness of physician judgment and an electronic algorithmic alert to identify pediatric patients with severe sepsis/septic shock in a pediatric emergency department (ED). This was an observational cohort study of patients older than 56 days with fever or hypothermia. All patients were evaluated for potential sepsis in real time by the ED clinical team. An electronic algorithmic alert was retrospectively applied to identify patients with potential sepsis independent of physician judgment. The primary outcome was the proportion of patients correctly identified with severe sepsis/septic shock defined by consensus criteria. Test characteristics were determined and receiver operating characteristic (ROC) curves were compared. Of 19,524 eligible patient visits, 88 patients developed consensus-confirmed severe sepsis or septic shock. Physician judgment identified 159 and the algorithmic alert identified 3,301 patients with potential sepsis. Physician judgment had sensitivity of 72.7% (95% confidence interval [CI] = 72.1% to 73.4%) and specificity of 99.5% (95% CI = 99.4% to 99.6%); the algorithmic alert had sensitivity of 92.1% (95% CI = 91.7% to 92.4%) and specificity of 83.4% (95% CI = 82.9% to 83.9%) for severe sepsis/septic shock. There was no significant difference in the area under the ROC curve for physician judgment (0.86, 95% CI = 0.81 to 0.91) or the algorithm (0.88, 95% CI = 0.85 to 0.91; p = 0.54). A combination method using either positive physician judgment or an algorithmic alert improved sensitivity to 96.6% and specificity to 83.3%. A sequential approach, in which positive identification by the algorithmic alert was then confirmed by physician judgment, achieved 68.2% sensitivity and 99.6% specificity. Positive and negative predictive values for physician judgment versus algorithmic alert were 40.3% versus 2.5% and 99.88% versus 99.96%, respectively. The electronic algorithmic alert was more sensitive but less specific than physician judgment for recognition of pediatric severe sepsis and septic shock. These findings can help to guide institutions in selecting pediatric sepsis recognition methods based on institutional needs and priorities. © 2015 by the Society for Academic Emergency Medicine.
Stopping Antidepressants and Anxiolytics as Major Concerns Reported in Online Health Communities: A Text Mining Approach.

PubMed

Abbe, Adeline; Falissard, Bruno

2017-10-23

Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.

Developing a disease outbreak event corpus.

PubMed

Conway, Mike; Kawazoe, Ai; Chanlekha, Hutchatai; Collier, Nigel

2010-09-28

In recent years, there has been a growth in work on the use of information extraction technologies for tracking disease outbreaks from online news texts, yet publicly available evaluation standards (and associated resources) for this new area of research have been noticeably lacking. This study seeks to create a "gold standard" data set against which to test how accurately disease outbreak information extraction systems can identify the semantics of disease outbreak events. Additionally, we hope that the provision of an annotation scheme (and associated corpus) to the community will encourage open evaluation in this new and growing application area. We developed an annotation scheme for identifying infectious disease outbreak events in news texts. An event--in the context of our annotation scheme--consists minimally of geographical (eg, country and province) and disease name information. However, the scheme also allows for the rich encoding of other domain salient concepts (eg, international travel, species, and food contamination). The work resulted in a 200-document corpus of event-annotated disease outbreak reports that can be used to evaluate the accuracy of event detection algorithms (in this case, for the BioCaster biosurveillance online news information extraction system). In the 200 documents, 394 distinct events were identified (mean 1.97 events per document, range 0-25 events per document). We also provide a download script and graphical user interface (GUI)-based event browsing software to facilitate corpus exploration. In summary, we present an annotation scheme and corpus that can be used in the evaluation of disease outbreak event extraction algorithms. The annotation scheme and corpus were designed both with the particular evaluation requirements of the BioCaster system in mind as well as the wider need for further evaluation resources in this growing research area.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

PubMed

Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

2017-08-31

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

PubMed Central

Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

2017-01-01

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Detecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques.

PubMed

Teimouri, Mehdi; Farzadfar, Farshad; Soudi Alamdari, Mahsa; Hashemi-Meshkini, Amir; Adibi Alamdari, Parisa; Rezaei-Darzi, Ehsan; Varmaghani, Mehdi; Zeynalabedini, Aysan

2016-01-01

Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various types of diseases from which we have focused on the identification of ten diseases. In this study, data mining tools are used to identify diseases for which prescriptions are written. In order to evaluate the performances of these methods, we compare the results with Naïve method. Then, combining methods are used to improve the results. Results showed that Support Vector Machine, with an accuracy of 95.32%, shows better performance than the other methods. The result of Naive method, with an accuracy of 67.71%, is 20% worse than Nearest Neighbor method which has the lowest level of accuracy among the other classification algorithms. The results indicate that the implementation of data mining algorithms resulted in a good performance in characterization of outpatient diseases. These results can help to choose appropriate methods for the classification of prescriptions in larger scales.
Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

PubMed Central

Swingley, Wesley D.; Meyer-Dombard, D’Arcy R.; Shock, Everett L.; Alsop, Eric B.; Falenski, Heinz D.; Havig, Jeff R.; Raymond, Jason

2012-01-01

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability. PMID:22675512
Two algorithms for neural-network design and training with application to channel equalization.

PubMed

Sweatman, C Z; Mulgrew, B; Gibson, G J

1998-01-01

We describe two algorithms for designing and training neural-network classifiers. The first, the linear programming slab algorithm (LPSA), is motivated by the problem of reconstructing digital signals corrupted by passage through a dispersive channel and by additive noise. It constructs a multilayer perceptron (MLP) to separate two disjoint sets by using linear programming methods to identify network parameters. The second, the perceptron learning slab algorithm (PLSA), avoids the computational costs of linear programming by using an error-correction approach to identify parameters. Both algorithms operate in highly constrained parameter spaces and are able to exploit symmetry in the classification problem. Using these algorithms, we develop a number of procedures for the adaptive equalization of a complex linear 4-quadrature amplitude modulation (QAM) channel, and compare their performance in a simulation study. Results are given for both stationary and time-varying channels, the latter based on the COST 207 GSM propagation model.
Prevalence and clinical correlates of sarcopenia in community-dwelling older people: application of the EWGSOP definition and diagnostic algorithm.

PubMed

Volpato, Stefano; Bianchi, Lara; Cherubini, Antonio; Landi, Francesco; Maggio, Marcello; Savino, Elisabetta; Bandinelli, Stefania; Ceda, Gian Paolo; Guralnik, Jack M; Zuliani, Giovanni; Ferrucci, Luigi

2014-04-01

Muscle impairment is a common condition in older people and a powerful risk factor for disability and mortality. The aim of this study was to apply the European Working Group on Sarcopenia in Older People criteria to estimate the prevalence and investigate the clinical correlates of sarcopenia, in a sample of Italian community-dwelling older people. Cross-sectional analysis of 730 participants (74% aged 65 years and older) enrolled in the InCHIANTI study. Sarcopenia was defined according to the European Working Group on Sarcopenia in Older People criteria using bioimpedance analysis for muscle mass assessment. Logistic regression analysis was used to identify the factors independently associated with sarcopenia. Sarcopenia defined by the European Working Group on Sarcopenia in Older People criteria increased steeply with age (p < .001), with 31.6% of women and 17.4% of men aged 80 years or older being affected by this condition. Higher education (odds ratio: 0.85; 95% CI: 0.74-0.98), lower insulin-like growth factor I (lowest vs highest tertile, odds ratio: 3.89; 95% CI: 1.03-14.1), and low bioavailable testosterone (odds ratio: 2.67; 95% CI: 1.31-5.44) were independently associated with the likelihood of being sarcopenic. Nutritional intake, physical activity, and level of comorbidity were not associated with sarcopenia. Sarcopenia identified by the European Working Group on Sarcopenia in Older People criteria is a relatively common condition in Italian octogenarians, and its prevalence increases with aging. Correlates of sarcopenia identified in this study might suggest new approaches for prevention and treatment of sarcopenia.
Fast image matching algorithm based on projection characteristics

NASA Astrophysics Data System (ADS)

Zhou, Lijuan; Yue, Xiaobo; Zhou, Lijun

2011-06-01

Based on analyzing the traditional template matching algorithm, this paper identified the key factors restricting the speed of matching and put forward a brand new fast matching algorithm based on projection. Projecting the grayscale image, this algorithm converts the two-dimensional information of the image into one-dimensional one, and then matches and identifies through one-dimensional correlation, meanwhile, because of normalization has been done, when the image brightness or signal amplitude increasing in proportion, it could also perform correct matching. Experimental results show that the projection characteristics based image registration method proposed in this article could greatly improve the matching speed, which ensuring the matching accuracy as well.
An almost-parameter-free harmony search algorithm for groundwater pollution source identification.

PubMed

Jiang, Simin; Zhang, Yali; Wang, Pei; Zheng, Maohui

2013-01-01

The spatiotemporal characterization of unknown sources of groundwater pollution is frequently encountered in environmental problems. This study adopts a simulation-optimization approach that combines a contaminant transport simulation model with a heuristic harmony search algorithm to identify unknown pollution sources. In the proposed methodology, an almost-parameter-free harmony search algorithm is developed. The performance of this methodology is evaluated on an illustrative groundwater pollution source identification problem, and the identified results indicate that the proposed almost-parameter-free harmony search algorithm-based optimization model can give satisfactory estimations, even when the irregular geometry, erroneous monitoring data, and prior information shortage of potential locations are considered.
Reducing false-positive detections by combining two stage-1 computer-aided mass detection algorithms

NASA Astrophysics Data System (ADS)

Bedard, Noah D.; Sampat, Mehul P.; Stokes, Patrick A.; Markey, Mia K.

2006-03-01

In this paper we present a strategy for reducing the number of false-positives in computer-aided mass detection. Our approach is to only mark "consensus" detections from among the suspicious sites identified by different "stage-1" detection algorithms. By "stage-1" we mean that each of the Computer-aided Detection (CADe) algorithms is designed to operate with high sensitivity, allowing for a large number of false positives. In this study, two mass detection methods were used: (1) Heath and Bowyer's algorithm based on the average fraction under the minimum filter (AFUM) and (2) a low-threshold bi-lateral subtraction algorithm. The two methods were applied separately to a set of images from the Digital Database for Screening Mammography (DDSM) to obtain paired sets of mass candidates. The consensus mass candidates for each image were identified by a logical "and" operation of the two CADe algorithms so as to eliminate regions of suspicion that were not independently identified by both techniques. It was shown that by combining the evidence from the AFUM filter method with that obtained from bi-lateral subtraction, the same sensitivity could be reached with fewer false-positives per image relative to using the AFUM filter alone.
Identify High-Quality Protein Structural Models by Enhanced K-Means.

PubMed

Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang

2017-01-01

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.
Identify High-Quality Protein Structural Models by Enhanced K-Means

PubMed Central

Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang

2017-01-01

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198
A systematic review of validated methods to capture stillbirth and spontaneous abortion using administrative or claims data.

PubMed

Likis, Frances E; Sathe, Nila A; Carnahan, Ryan; McPheeters, Melissa L

2013-12-30

To identify and assess diagnosis, procedure and pharmacy dispensing codes used to identify stillbirths and spontaneous abortion in administrative and claims databases from the United States or Canada. We searched the MEDLINE database from 1991 to September 2012 using controlled vocabulary and key terms related to stillbirth or spontaneous abortion. We also searched the reference lists of included studies. Two investigators independently assessed the full text of studies against pre-determined inclusion criteria. Two reviewers independently extracted data regarding participant and algorithm characteristics and assessed each study's methodological rigor using a pre-defined approach. Ten publications addressing stillbirth and four addressing spontaneous abortion met our inclusion criteria. The International Classification of Diseases, Ninth Revision (ICD-9) codes most commonly used in algorithms for stillbirth were those for intrauterine death (656.4) and stillborn outcomes of delivery (V27.1, V27.3-V27.4, and V27.6-V27.7). Papers identifying spontaneous abortion used codes for missed abortion and spontaneous abortion: 632, 634.x, as well as V27.0-V27.7. Only two studies identifying stillbirth reported validation of algorithms. The overall positive predictive value of the algorithms was high (99%-100%), and one study reported an algorithm with 86% sensitivity. However, the predictive value of individual codes was not assessed and study populations were limited to specific geographic areas. Additional validation studies with a nationally representative sample are needed to confirm the optimal algorithm to identify stillbirths or spontaneous abortion in administrative and claims databases.' Copyright © 2013 Elsevier Ltd. All rights reserved.
Challenges and Opportunities for Harmonizing Research Methodology: Raw Accelerometry.

PubMed

van Hees, Vincent T; Thaler-Kall, Kathrin; Wolf, Klaus-Hendrik; Brønd, Jan C; Bonomi, Alberto; Schulze, Mareike; Vigl, Matthäus; Morseth, Bente; Hopstock, Laila Arnesdatter; Gorzelniak, Lukas; Schulz, Holger; Brage, Søren; Horsch, Alexander

2016-12-07

Raw accelerometry is increasingly being used in physical activity research, but diversity in sensor design, attachment and signal processing challenges the comparability of research results. Therefore, efforts are needed to harmonize the methodology. In this article we reflect on how increased methodological harmonization may be achieved. The authors of this work convened for a two-day workshop (March 2014) themed on methodological harmonization of raw accelerometry. The discussions at the workshop were used as a basis for this review. Key stakeholders were identified as manufacturers, method developers, method users (application), publishers, and funders. To facilitate methodological harmonization in raw accelerometry the following action points were proposed: i) Manufacturers are encouraged to provide a detailed specification of their sensors, ii) Each fundamental step of algorithms for processing raw accelerometer data should be documented, and ideally also motivated, to facilitate interpretation and discussion, iii) Algorithm developers and method users should be open about uncertainties in the description of data and the uncertainty of the inference itself, iv) All new algorithms which are pitched as "ready for implementation" should be shared with the community to facilitate replication and ongoing evaluation by independent groups, and v) A dynamic interaction between method stakeholders should be encouraged to facilitate a well-informed harmonization process. The workshop led to the identification of a number of opportunities for harmonizing methodological practice. The discussion as well as the practical checklists proposed in this review should provide guidance for stakeholders on how to contribute to increased harmonization.
Two New Tools for Glycopeptide Analysis Researchers: A Glycopeptide Decoy Generator and a Large Data Set of Assigned CID Spectra of Glycopeptides.

PubMed

Lakbub, Jude C; Su, Xiaomeng; Zhu, Zhikai; Patabandige, Milani W; Hua, David; Go, Eden P; Desaire, Heather

2017-08-04

The glycopeptide analysis field is tightly constrained by a lack of effective tools that translate mass spectrometry data into meaningful chemical information, and perhaps the most challenging aspect of building effective glycopeptide analysis software is designing an accurate scoring algorithm for MS/MS data. We provide the glycoproteomics community with two tools to address this challenge. The first tool, a curated set of 100 expert-assigned CID spectra of glycopeptides, contains a diverse set of spectra from a variety of glycan types; the second tool, Glycopeptide Decoy Generator, is a new software application that generates glycopeptide decoys de novo. We developed these tools so that emerging methods of assigning glycopeptides' CID spectra could be rigorously tested. Software developers or those interested in developing skills in expert (manual) analysis can use these tools to facilitate their work. We demonstrate the tools' utility in assessing the quality of one particular glycopeptide software package, GlycoPep Grader, which assigns glycopeptides to CID spectra. We first acquired the set of 100 expert assigned CID spectra; then, we used the Decoy Generator (described herein) to generate 20 decoys per target glycopeptide. The assigned spectra and decoys were used to test the accuracy of GlycoPep Grader's scoring algorithm; new strengths and weaknesses were identified in the algorithm using this approach. Both newly developed tools are freely available. The software can be downloaded at http://glycopro.chem.ku.edu/GPJ.jar.
The McGill Interactive Pediatric OncoGenetic Guidelines: An approach to identifying pediatric oncology patients most likely to benefit from a genetic evaluation.

PubMed

Goudie, Catherine; Coltin, Hallie; Witkowski, Leora; Mourad, Stephanie; Malkin, David; Foulkes, William D

2017-08-01

Identifying cancer predisposition syndromes in children with tumors is crucial, yet few clinical guidelines exist to identify children at high risk of having germline mutations. The McGill Interactive Pediatric OncoGenetic Guidelines project aims to create a validated pediatric guideline in the form of a smartphone/tablet application using algorithms to process clinical data and help determine whether to refer a child for genetic assessment. This paper discusses the initial stages of the project, focusing on its overall structure, the methodology underpinning the algorithms, and the upcoming algorithm validation process. © 2017 Wiley Periodicals, Inc.
Syndromic Algorithms for Detection of Gambiense Human African Trypanosomiasis in South Sudan

PubMed Central

Palmer, Jennifer J.; Surur, Elizeous I.; Goch, Garang W.; Mayen, Mangar A.; Lindner, Andreas K.; Pittet, Anne; Kasparian, Serena; Checchi, Francesco; Whitty, Christopher J. M.

2013-01-01

Background Active screening by mobile teams is considered the best method for detecting human African trypanosomiasis (HAT) caused by Trypanosoma brucei gambiense but the current funding context in many post-conflict countries limits this approach. As an alternative, non-specialist health care workers (HCWs) in peripheral health facilities could be trained to identify potential cases who need testing based on their symptoms. We explored the predictive value of syndromic referral algorithms to identify symptomatic cases of HAT among a treatment-seeking population in Nimule, South Sudan. Methodology/Principal Findings Symptom data from 462 patients (27 cases) presenting for a HAT test via passive screening over a 7 month period were collected to construct and evaluate over 14,000 four item syndromic algorithms considered simple enough to be used by peripheral HCWs. For comparison, algorithms developed in other settings were also tested on our data, and a panel of expert HAT clinicians were asked to make referral decisions based on the symptom dataset. The best performing algorithms consisted of three core symptoms (sleep problems, neurological problems and weight loss), with or without a history of oedema, cervical adenopathy or proximity to livestock. They had a sensitivity of 88.9–92.6%, a negative predictive value of up to 98.8% and a positive predictive value in this context of 8.4–8.7%. In terms of sensitivity, these out-performed more complex algorithms identified in other studies, as well as the expert panel. The best-performing algorithm is predicted to identify about 9/10 treatment-seeking HAT cases, though only 1/10 patients referred would test positive. Conclusions/Significance In the absence of regular active screening, improving referrals of HAT patients through other means is essential. Systematic use of syndromic algorithms by peripheral HCWs has the potential to increase case detection and would increase their participation in HAT programmes. The algorithms proposed here, though promising, should be validated elsewhere. PMID:23350005
The Effect of Shadow Area on Sgm Algorithm and Disparity Map Refinement from High Resolution Satellite Stereo Images

NASA Astrophysics Data System (ADS)

Tatar, N.; Saadatseresht, M.; Arefi, H.

2017-09-01

Semi Global Matching (SGM) algorithm is known as a high performance and reliable stereo matching algorithm in photogrammetry community. However, there are some challenges using this algorithm especially for high resolution satellite stereo images over urban areas and images with shadow areas. As it can be seen, unfortunately the SGM algorithm computes highly noisy disparity values for shadow areas around the tall neighborhood buildings due to mismatching in these lower entropy areas. In this paper, a new method is developed to refine the disparity map in shadow areas. The method is based on the integration of potential of panchromatic and multispectral image data to detect shadow areas in object level. In addition, a RANSAC plane fitting and morphological filtering are employed to refine the disparity map. The results on a stereo pair of GeoEye-1 captured over Qom city in Iran, shows a significant increase in the rate of matched pixels compared to standard SGM algorithm.
Software for project-based learning of robot motion planning

NASA Astrophysics Data System (ADS)

Moll, Mark; Bordeaux, Janice; Kavraki, Lydia E.

2013-12-01

Motion planning is a core problem in robotics concerned with finding feasible paths for a given robot. Motion planning algorithms perform a search in the high-dimensional continuous space of robot configurations and exemplify many of the core algorithmic concepts of search algorithms and associated data structures. Motion planning algorithms can be explained in a simplified two-dimensional setting, but this masks many of the subtleties and complexities of the underlying problem. We have developed software for project-based learning of motion planning that enables deep learning. The projects that we have developed allow advanced undergraduate students and graduate students to reflect on the performance of existing textbook algorithms and their own variations on such algorithms. Formative assessment has been conducted at three institutions. The core of the software used for this teaching module is also used within the Robot Operating System, a widely adopted platform by the robotics research community. This allows for transfer of knowledge and skills to robotics research projects involving a large variety robot hardware platforms.
Linear antenna array optimization using flower pollination algorithm.

PubMed

Saxena, Prerna; Kothari, Ashwin

2016-01-01

Flower pollination algorithm (FPA) is a new nature-inspired evolutionary algorithm used to solve multi-objective optimization problems. The aim of this paper is to introduce FPA to the electromagnetics and antenna community for the optimization of linear antenna arrays. FPA is applied for the first time to linear array so as to obtain optimized antenna positions in order to achieve an array pattern with minimum side lobe level along with placement of deep nulls in desired directions. Various design examples are presented that illustrate the use of FPA for linear antenna array optimization, and subsequently the results are validated by benchmarking along with results obtained using other state-of-the-art, nature-inspired evolutionary algorithms such as particle swarm optimization, ant colony optimization and cat swarm optimization. The results suggest that in most cases, FPA outperforms the other evolutionary algorithms and at times it yields a similar performance.

Status Report on the First Round of the Development of the Advanced Encryption Standard

PubMed Central

Nechvatal, James; Barker, Elaine; Dodson, Donna; Dworkin, Morris; Foti, James; Roback, Edward

1999-01-01

In 1997, the National Institute of Standards and Technology (NIST) initiated a process to select a symmetric-key encryption algorithm to be used to protect sensitive (unclassified) Federal information in furtherance of NIST’s statutory responsibilities. In 1998, NIST announced the acceptance of 15 candidate algorithms and requested the assistance of the cryptographic research community in analyzing the candidates. This analysis included an initial examination of the security and efficiency characteristics for each algorithm. NIST has reviewed the results of this research and selected five algorithms (MARS, RC6™, Rijndael, Serpent and Twofish) as finalists. The research results and rationale for the selection of the finalists are documented in this report. The five finalists will be the subject of further study before the selection of one or more of these algorithms for inclusion in the Advanced Encryption Standard.
Novel trace chemical detection algorithms: a comparative study

NASA Astrophysics Data System (ADS)

Raz, Gil; Murphy, Cara; Georgan, Chelsea; Greenwood, Ross; Prasanth, R. K.; Myers, Travis; Goyal, Anish; Kelley, David; Wood, Derek; Kotidis, Petros

2017-05-01

Algorithms for standoff detection and estimation of trace chemicals in hyperspectral images in the IR band are a key component for a variety of applications relevant to law-enforcement and the intelligence communities. Performance of these methods is impacted by the spectral signature variability due to presence of contaminants, surface roughness, nonlinear dependence on abundances as well as operational limitations on the compute platforms. In this work we provide a comparative performance and complexity analysis of several classes of algorithms as a function of noise levels, error distribution, scene complexity, and spatial degrees of freedom. The algorithm classes we analyze and test include adaptive cosine estimator (ACE and modifications to it), compressive/sparse methods, Bayesian estimation, and machine learning. We explicitly call out the conditions under which each algorithm class is optimal or near optimal as well as their built-in limitations and failure modes.
Twitter K-H networks in action: Advancing biomedical literature for drug search.

PubMed

Hamed, Ahmed Abdeen; Wu, Xindong; Erickson, Robert; Fandy, Tamer

2015-08-01

The importance of searching biomedical literature for drug interaction and side-effects is apparent. Current digital libraries (e.g., PubMed) suffer infrequent tagging and metadata annotation updates. Such limitations cause absence of linking literature to new scientific evidence. This demonstrates a great deal of challenges that stand in the way of scientists when searching biomedical repositories. In this paper, we present a network mining approach that provides a bridge for linking and searching drug-related literature. Our contributions here are two fold: (1) an efficient algorithm called HashPairMiner to address the run-time complexity issues demonstrated in its predecessor algorithm: HashnetMiner, and (2) a database of discoveries hosted on the web to facilitate literature search using the results produced by HashPairMiner. Though the K-H network model and the HashPairMiner algorithm are fairly young, their outcome is evidence of the considerable promise they offer to the biomedical science community in general and the drug research community in particular. Copyright © 2015 Elsevier Inc. All rights reserved.
Designing highly flexible and usable cyberinfrastructures for convergence.

PubMed

Herr, Bruce W; Huang, Weixia; Penumarthy, Shashikant; Börner, Katy

2006-12-01

This article presents the results of a 7-year-long quest into the development of a "dream tool" for our research in information science and scientometrics and more recently, network science. The results are two cyberinfrastructures (CI): The Cyberinfrastructure for Information Visualization and the Network Workbench that enjoy a growing national and interdisciplinary user community. Both CIs use the cyberinfrastructure shell (CIShell) software specification, which defines interfaces between data sets and algorithms/services and provides a means to bundle them into powerful tools and (Web) services. In fact, CIShell might be our major contribution to progress in convergence. Just as Wikipedia is an "empty shell" that empowers lay persons to share text, a CIShell implementation is an "empty shell" that empowers user communities to plug-and-play, share, compare and combine data sets, algorithms, and compute resources across national and disciplinary boundaries. It is argued here that CIs will not only transform the way science is conducted but also will play a major role in the diffusion of expertise, data sets, algorithms, and technologies across multiple disciplines and business sectors leading to a more integrative science.
Research Data Alliance: Understanding Big Data Analytics Applications in Earth Science

NASA Astrophysics Data System (ADS)

Riedel, Morris; Ramachandran, Rahul; Baumann, Peter

2014-05-01

The Research Data Alliance (RDA) enables data to be shared across barriers through focused working groups and interest groups, formed of experts from around the world - from academia, industry and government. Its Big Data Analytics (BDA) interest groups seeks to develop community based recommendations on feasible data analytics approaches to address scientific community needs of utilizing large quantities of data. BDA seeks to analyze different scientific domain applications (e.g. earth science use cases) and their potential use of various big data analytics techniques. These techniques reach from hardware deployment models up to various different algorithms (e.g. machine learning algorithms such as support vector machines for classification). A systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries will be covered in these recommendations. This contribution will outline initial parts of such a classification and recommendations in the specific context of the field of Earth Sciences. Given lessons learned and experiences are based on a survey of use cases and also providing insights in a few use cases in detail.
Research Data Alliance: Understanding Big Data Analytics Applications in Earth Science

NASA Technical Reports Server (NTRS)

Riedel, Morris; Ramachandran, Rahul; Baumann, Peter

2014-01-01

The Research Data Alliance (RDA) enables data to be shared across barriers through focused working groups and interest groups, formed of experts from around the world - from academia, industry and government. Its Big Data Analytics (BDA) interest groups seeks to develop community based recommendations on feasible data analytics approaches to address scientific community needs of utilizing large quantities of data. BDA seeks to analyze different scientific domain applications (e.g. earth science use cases) and their potential use of various big data analytics techniques. These techniques reach from hardware deployment models up to various different algorithms (e.g. machine learning algorithms such as support vector machines for classification). A systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries will be covered in these recommendations. This contribution will outline initial parts of such a classification and recommendations in the specific context of the field of Earth Sciences. Given lessons learned and experiences are based on a survey of use cases and also providing insights in a few use cases in detail.
RayPlus: a Web-Based Platform for Medical Image Processing.

PubMed

Yuan, Rong; Luo, Ming; Sun, Zhi; Shi, Shuyue; Xiao, Peng; Xie, Qingguo

2017-04-01

Medical image can provide valuable information for preclinical research, clinical diagnosis, and treatment. As the widespread use of digital medical imaging, many researchers are currently developing medical image processing algorithms and systems in order to accommodate a better result to clinical community, including accurate clinical parameters or processed images from the original images. In this paper, we propose a web-based platform to present and process medical images. By using Internet and novel database technologies, authorized users can easily access to medical images and facilitate their workflows of processing with server-side powerful computing performance without any installation. We implement a series of algorithms of image processing and visualization in the initial version of Rayplus. Integration of our system allows much flexibility and convenience for both research and clinical communities.
Large eddy simulations in 2030 and beyond

PubMed Central

Piomelli, U

2014-01-01

Since its introduction, in the early 1970s, large eddy simulations (LES) have advanced considerably, and their application is transitioning from the academic environment to industry. Several landmark developments can be identified over the past 40 years, such as the wall-resolved simulations of wall-bounded flows, the development of advanced models for the unresolved scales that adapt to the local flow conditions and the hybridization of LES with the solution of the Reynolds-averaged Navier–Stokes equations. Thanks to these advancements, LES is now in widespread use in the academic community and is an option available in most commercial flow-solvers. This paper will try to predict what algorithmic and modelling advancements are needed to make it even more robust and inexpensive, and which areas show the most promise. PMID:25024415
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China

PubMed Central

Zou, Hui; Zou, Zhihong; Wang, Xiaojing

2015-01-01

The increase and the complexity of data caused by the uncertain environment is today’s reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006–2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality. PMID:26569283
Predicting missing links and identifying spurious links via likelihood analysis

NASA Astrophysics Data System (ADS)

Pan, Liming; Zhou, Tao; Lü, Linyuan; Hu, Chin-Kun

2016-03-01

Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network’s probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms.
Predicting missing links and identifying spurious links via likelihood analysis

PubMed Central

Pan, Liming; Zhou, Tao; Lü, Linyuan; Hu, Chin-Kun

2016-01-01

Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network’s probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms. PMID:26961965
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

PubMed Central

Poole, William; Leinonen, Kalle; Shmulevich, Ilya

2017-01-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

PubMed

Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

2017-02-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
De-identifying an EHR database - anonymity, correctness and readability of the medical record.

PubMed

Pantazos, Kostas; Lauesen, Soren; Lippert, Soren

2011-01-01

Electronic health records (EHR) contain a large amount of structured data and free text. Exploring and sharing clinical data can improve healthcare and facilitate the development of medical software. However, revealing confidential information is against ethical principles and laws. We de-identified a Danish EHR database with 437,164 patients. The goal was to generate a version with real medical records, but related to artificial persons. We developed a de-identification algorithm that uses lists of named entities, simple language analysis, and special rules. Our algorithm consists of 3 steps: collect lists of identifiers from the database and external resources, define a replacement for each identifier, and replace identifiers in structured data and free text. Some patient records could not be safely de-identified, so the de-identified database has 323,122 patient records with an acceptable degree of anonymity, readability and correctness (F-measure of 95%). The algorithm has to be adjusted for each culture, language and database.
Setting research priorities to improve global newborn health and prevent stillbirths by 2025

PubMed Central

Yoshida, Sachiyo; Martines, José; Lawn, Joy E; Wall, Stephen; Souza, Joăo Paulo; Rudan, Igor; Cousens, Simon; Aaby, Peter; Adam, Ishag; Adhikari, Ramesh Kant; Ambalavanan, Namasivayam; Arifeen, Shams EI; Aryal, Dhana Raj; Asiruddin, Sk; Baqui, Abdullah; Barros, Aluisio JD; Benn, Christine S; Bhandari, Vineet; Bhatnagar, Shinjini; Bhattacharya, Sohinee; Bhutta, Zulfiqar A; Black, Robert E; Blencowe, Hannah; Bose, Carl; Brown, Justin; Bührer, Christoph; Carlo, Wally; Cecatti, Jose Guilherme; Cheung, Po–Yin; Clark, Robert; Colbourn, Tim; Conde–Agudelo, Agustin; Corbett, Erica; Czeizel, Andrew E; Das, Abhik; Day, Louise Tina; Deal, Carolyn; Deorari, Ashok; Dilmen, Uğur; English, Mike; Engmann, Cyril; Esamai, Fabian; Fall, Caroline; Ferriero, Donna M; Gisore, Peter; Hazir, Tabish; Higgins, Rosemary D; Homer, Caroline SE; Hoque, DE; Irgens, Lorentz; Islam, MT; de Graft–Johnson, Joseph; Joshua, Martias Alice; Keenan, William; Khatoon, Soofia; Kieler, Helle; Kramer, Michael S; Lackritz, Eve M; Lavender, Tina; Lawintono, Laurensia; Luhanga, Richard; Marsh, David; McMillan, Douglas; McNamara, Patrick J; Mol, Ben Willem J; Molyneux, Elizabeth; Mukasa, G. K; Mutabazi, Miriam; Nacul, Luis Carlos; Nakakeeto, Margaret; Narayanan, Indira; Olusanya, Bolajoko; Osrin, David; Paul, Vinod; Poets, Christian; Reddy, Uma M; Santosham, Mathuram; Sayed, Rubayet; Schlabritz–Loutsevitch, Natalia E; Singhal, Nalini; Smith, Mary Alice; Smith, Peter G; Soofi, Sajid; Spong, Catherine Y; Sultana, Shahin; Tshefu, Antoinette; van Bel, Frank; Gray, Lauren Vestewig; Waiswa, Peter; Wang, Wei; Williams, Sarah LA; Wright, Linda; Zaidi, Anita; Zhang, Yanfeng; Zhong, Nanbert; Zuniga, Isabel; Bahl, Rajiv

2016-01-01

Background In 2013, an estimated 2.8 million newborns died and 2.7 million were stillborn. A much greater number suffer from long term impairment associated with preterm birth, intrauterine growth restriction, congenital anomalies, and perinatal or infectious causes. With the approaching deadline for the achievement of the Millennium Development Goals (MDGs) in 2015, there was a need to set the new research priorities on newborns and stillbirth with a focus not only on survival but also on health, growth and development. We therefore carried out a systematic exercise to set newborn health research priorities for 2013–2025. Methods We used adapted Child Health and Nutrition Research Initiative (CHNRI) methods for this prioritization exercise. We identified and approached the 200 most productive researchers and 400 program experts, and 132 of them submitted research questions online. These were collated into a set of 205 research questions, sent for scoring to the 600 identified experts, and were assessed and scored by 91 experts. Results Nine out of top ten identified priorities were in the domain of research on improving delivery of known interventions, with simplified neonatal resuscitation program and clinical algorithms and improved skills of community health workers leading the list. The top 10 priorities in the domain of development were led by ideas on improved Kangaroo Mother Care at community level, how to improve the accuracy of diagnosis by community health workers, and perinatal audits. The 10 leading priorities for discovery research focused on stable surfactant with novel modes of administration for preterm babies, ability to diagnose fetal distress and novel tocolytic agents to delay or stop preterm labour. Conclusion These findings will assist both donors and researchers in supporting and conducting research to close the knowledge gaps for reducing neonatal mortality, morbidity and long term impairment. WHO, SNL and other partners will work to generate interest among key national stakeholders, governments, NGOs, and research institutes in these priorities, while encouraging research funders to support them. We will track research funding, relevant requests for proposals and trial registers to monitor if the priorities identified by this exercise are being addressed. PMID:26401272
Setting research priorities to improve global newborn health and prevent stillbirths by 2025.

PubMed

Yoshida, Sachiyo; Martines, José; Lawn, Joy E; Wall, Stephen; Souza, Joăo Paulo; Rudan, Igor; Cousens, Simon; Aaby, Peter; Adam, Ishag; Adhikari, Ramesh Kant; Ambalavanan, Namasivayam; Arifeen, Shams Ei; Aryal, Dhana Raj; Asiruddin, Sk; Baqui, Abdullah; Barros, Aluisio Jd; Benn, Christine S; Bhandari, Vineet; Bhatnagar, Shinjini; Bhattacharya, Sohinee; Bhutta, Zulfiqar A; Black, Robert E; Blencowe, Hannah; Bose, Carl; Brown, Justin; Bührer, Christoph; Carlo, Wally; Cecatti, Jose Guilherme; Cheung, Po-Yin; Clark, Robert; Colbourn, Tim; Conde-Agudelo, Agustin; Corbett, Erica; Czeizel, Andrew E; Das, Abhik; Day, Louise Tina; Deal, Carolyn; Deorari, Ashok; Dilmen, Uğur; English, Mike; Engmann, Cyril; Esamai, Fabian; Fall, Caroline; Ferriero, Donna M; Gisore, Peter; Hazir, Tabish; Higgins, Rosemary D; Homer, Caroline Se; Hoque, D E; Irgens, Lorentz; Islam, M T; de Graft-Johnson, Joseph; Joshua, Martias Alice; Keenan, William; Khatoon, Soofia; Kieler, Helle; Kramer, Michael S; Lackritz, Eve M; Lavender, Tina; Lawintono, Laurensia; Luhanga, Richard; Marsh, David; McMillan, Douglas; McNamara, Patrick J; Mol, Ben Willem J; Molyneux, Elizabeth; Mukasa, G K; Mutabazi, Miriam; Nacul, Luis Carlos; Nakakeeto, Margaret; Narayanan, Indira; Olusanya, Bolajoko; Osrin, David; Paul, Vinod; Poets, Christian; Reddy, Uma M; Santosham, Mathuram; Sayed, Rubayet; Schlabritz-Loutsevitch, Natalia E; Singhal, Nalini; Smith, Mary Alice; Smith, Peter G; Soofi, Sajid; Spong, Catherine Y; Sultana, Shahin; Tshefu, Antoinette; van Bel, Frank; Gray, Lauren Vestewig; Waiswa, Peter; Wang, Wei; Williams, Sarah LA; Wright, Linda; Zaidi, Anita; Zhang, Yanfeng; Zhong, Nanbert; Zuniga, Isabel; Bahl, Rajiv

2016-06-01

In 2013, an estimated 2.8 million newborns died and 2.7 million were stillborn. A much greater number suffer from long term impairment associated with preterm birth, intrauterine growth restriction, congenital anomalies, and perinatal or infectious causes. With the approaching deadline for the achievement of the Millennium Development Goals (MDGs) in 2015, there was a need to set the new research priorities on newborns and stillbirth with a focus not only on survival but also on health, growth and development. We therefore carried out a systematic exercise to set newborn health research priorities for 2013-2025. We used adapted Child Health and Nutrition Research Initiative (CHNRI) methods for this prioritization exercise. We identified and approached the 200 most productive researchers and 400 program experts, and 132 of them submitted research questions online. These were collated into a set of 205 research questions, sent for scoring to the 600 identified experts, and were assessed and scored by 91 experts. Nine out of top ten identified priorities were in the domain of research on improving delivery of known interventions, with simplified neonatal resuscitation program and clinical algorithms and improved skills of community health workers leading the list. The top 10 priorities in the domain of development were led by ideas on improved Kangaroo Mother Care at community level, how to improve the accuracy of diagnosis by community health workers, and perinatal audits. The 10 leading priorities for discovery research focused on stable surfactant with novel modes of administration for preterm babies, ability to diagnose fetal distress and novel tocolytic agents to delay or stop preterm labour. These findings will assist both donors and researchers in supporting and conducting research to close the knowledge gaps for reducing neonatal mortality, morbidity and long term impairment. WHO, SNL and other partners will work to generate interest among key national stakeholders, governments, NGOs, and research institutes in these priorities, while encouraging research funders to support them. We will track research funding, relevant requests for proposals and trial registers to monitor if the priorities identified by this exercise are being addressed.
Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection.

PubMed

Surian, Didi; Nguyen, Dat Quoc; Kennedy, Georgina; Johnson, Mark; Coiera, Enrico; Dunn, Adam G

2016-08-29

In public health surveillance, measuring how information enters and spreads through online communities may help us understand geographical variation in decision making associated with poor health outcomes. Our aim was to evaluate the use of community structure and topic modeling methods as a process for characterizing the clustering of opinions about human papillomavirus (HPV) vaccines on Twitter. The study examined Twitter posts (tweets) collected between October 2013 and October 2015 about HPV vaccines. We tested Latent Dirichlet Allocation and Dirichlet Multinomial Mixture (DMM) models for inferring topics associated with tweets, and community agglomeration (Louvain) and the encoding of random walks (Infomap) methods to detect community structure of the users from their social connections. We examined the alignment between community structure and topics using several common clustering alignment measures and introduced a statistical measure of alignment based on the concentration of specific topics within a small number of communities. Visualizations of the topics and the alignment between topics and communities are presented to support the interpretation of the results in context of public health communication and identification of communities at risk of rejecting the safety and efficacy of HPV vaccines. We analyzed 285,417 Twitter posts (tweets) about HPV vaccines from 101,519 users connected by 4,387,524 social connections. Examining the alignment between the community structure and the topics of tweets, the results indicated that the Louvain community detection algorithm together with DMM produced consistently higher alignment values and that alignments were generally higher when the number of topics was lower. After applying the Louvain method and DMM with 30 topics and grouping semantically similar topics in a hierarchy, we characterized 163,148 (57.16%) tweets as evidence and advocacy, and 6244 (2.19%) tweets describing personal experiences. Among the 4548 users who posted experiential tweets, 3449 users (75.84%) were found in communities where the majority of tweets were about evidence and advocacy. The use of community detection in concert with topic modeling appears to be a useful way to characterize Twitter communities for the purpose of opinion surveillance in public health applications. Our approach may help identify online communities at risk of being influenced by negative opinions about public health interventions such as HPV vaccines.
Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection

PubMed Central

Nguyen, Dat Quoc; Kennedy, Georgina; Johnson, Mark; Coiera, Enrico; Dunn, Adam G

2016-01-01

Background In public health surveillance, measuring how information enters and spreads through online communities may help us understand geographical variation in decision making associated with poor health outcomes. Objective Our aim was to evaluate the use of community structure and topic modeling methods as a process for characterizing the clustering of opinions about human papillomavirus (HPV) vaccines on Twitter. Methods The study examined Twitter posts (tweets) collected between October 2013 and October 2015 about HPV vaccines. We tested Latent Dirichlet Allocation and Dirichlet Multinomial Mixture (DMM) models for inferring topics associated with tweets, and community agglomeration (Louvain) and the encoding of random walks (Infomap) methods to detect community structure of the users from their social connections. We examined the alignment between community structure and topics using several common clustering alignment measures and introduced a statistical measure of alignment based on the concentration of specific topics within a small number of communities. Visualizations of the topics and the alignment between topics and communities are presented to support the interpretation of the results in context of public health communication and identification of communities at risk of rejecting the safety and efficacy of HPV vaccines. Results We analyzed 285,417 Twitter posts (tweets) about HPV vaccines from 101,519 users connected by 4,387,524 social connections. Examining the alignment between the community structure and the topics of tweets, the results indicated that the Louvain community detection algorithm together with DMM produced consistently higher alignment values and that alignments were generally higher when the number of topics was lower. After applying the Louvain method and DMM with 30 topics and grouping semantically similar topics in a hierarchy, we characterized 163,148 (57.16%) tweets as evidence and advocacy, and 6244 (2.19%) tweets describing personal experiences. Among the 4548 users who posted experiential tweets, 3449 users (75.84%) were found in communities where the majority of tweets were about evidence and advocacy. Conclusions The use of community detection in concert with topic modeling appears to be a useful way to characterize Twitter communities for the purpose of opinion surveillance in public health applications. Our approach may help identify online communities at risk of being influenced by negative opinions about public health interventions such as HPV vaccines. PMID:27573910
Enhancing navigation in biomedical databases by community voting and database-driven text classification

PubMed Central

Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

2009-01-01

Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
A new electrocardiogram algorithm for diagnosing loss of ventricular capture during cardiac resynchronisation therapy.

PubMed

Ganière, Vincent; Domenichini, Giulia; Niculescu, Viviana; Cassagneau, Romain; Defaye, Pascal; Burri, Haran

2013-03-01

The prerequisite for cardiac resynchronization therapy (CRT) is ventricular capture, which may be verified by analysis of the surface electrocardiogram (ECG). Few algorithms exist to diagnose loss of ventricular capture. Electrocardiograms from 126 CRT patients were analysed during biventricular (BV), right ventricular (RV), and left ventricular (LV) pacing. An algorithm evaluating QRS narrowing in the limb leads and increasing negativity in lead I to diagnose changes in ventricular capture was devised, prospectively validated, and compared with two existing algorithms. Performance of the algorithm according to ventricular lead position was also assessed. Our algorithm had an accuracy of 88% to correctly identify the changes in ventricular capture (either loss or gain of RV or LV capture). The algorithm had a sensitivity of 94% and a specificity of 96% with an accuracy of 96% for identifying loss of LV capture (the most clinically relevant change), and compared favourably with the existing algorithms. Performance of the algorithms was not significantly affected by RV or LV lead position. A simple two-step algorithm evaluating QRS width in the limb leads and changes in negativity in lead I can accurately diagnose the lead responsible for intermittent loss of ventricular capture in CRT. This simple tool may be of particular use outside the setting of specialized device clinics.

Applications of network analysis for adaptive management of artificial drainage systems in landscapes vulnerable to sea level rise

NASA Astrophysics Data System (ADS)

Poulter, Benjamin; Goodall, Jonathan L.; Halpin, Patrick N.

2008-08-01

SummaryThe vulnerability of coastal landscapes to sea level rise is compounded by the existence of extensive artificial drainage networks initially built to lower water tables for agriculture, forestry, and human settlements. These drainage networks are found in landscapes with little topographic relief where channel flow is characterized by bi-directional movement across multiple time-scales and related to precipitation, wind, and tidal patterns. The current configuration of many artificial drainage networks exacerbates impacts associated with sea level rise such as salt-intrusion and increased flooding. This suggests that in the short-term, drainage networks might be managed to mitigate sea level rise related impacts. The challenge, however, is that hydrologic processes in regions where channel flow direction is weakly related to slope and topography require extensive parameterization for numerical models which is limited where network size is on the order of a hundred or more kilometers in total length. Here we present an application of graph theoretic algorithms to efficiently investigate network properties relevant to the management of a large artificial drainage system in coastal North Carolina, USA. We created a digital network model representing the observation network topology and four types of drainage features (canal, collector and field ditches, and streams). We applied betweenness-centrality concepts (using Dijkstra's shortest path algorithm) to determine major hydrologic flowpaths based off of hydraulic resistance. Following this, we identified sub-networks that could be managed independently using a community structure and modularity approach. Lastly, a betweenness-centrality algorithm was applied to identify major shoreline entry points to the network that disproportionately control water movement in and out of the network. We demonstrate that graph theory can be applied to solving management and monitoring problems associated with sea level rise for poorly understood drainage networks in advance of numerical methods.
TU-F-CAMPUS-T-05: A Cloud-Based Monte Carlo Dose Calculation for Electron Cutout Factors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitchell, T; Bush, K

Purpose: For electron cutouts of smaller sizes, it is necessary to verify electron cutout factors due to perturbations in electron scattering. Often, this requires a physical measurement using a small ion chamber, diode, or film. The purpose of this study is to develop a fast Monte Carlo based dose calculation framework that requires only a smart phone photograph of the cutout and specification of the SSD and energy to determine the electron cutout factor, with the ultimate goal of making this cloud-based calculation widely available to the medical physics community. Methods: The algorithm uses a pattern recognition technique to identifymore » the corners of the cutout in the photograph as shown in Figure 1. It then corrects for variations in perspective, scaling, and translation of the photograph introduced by the user’s positioning of the camera. Blob detection is used to identify the portions of the cutout which comprise the aperture and the portions which are cutout material. This information is then used define physical densities of the voxels used in the Monte Carlo dose calculation algorithm as shown in Figure 2, and select a particle source from a pre-computed library of phase-spaces scored above the cutout. The electron cutout factor is obtained by taking a ratio of the maximum dose delivered with the cutout in place to the dose delivered under calibration/reference conditions. Results: The algorithm has been shown to successfully identify all necessary features of the electron cutout to perform the calculation. Subsequent testing will be performed to compare the Monte Carlo results with a physical measurement. Conclusion: A simple, cloud-based method of calculating electron cutout factors could eliminate the need for physical measurements and substantially reduce the time required to properly assure accurate dose delivery.« less
Form Subdivisions: Their Identification and Use in LCSH.

ERIC Educational Resources Information Center

O'Neill, Edward T.; Chan, Lois Mai; Childress, Eric; Dean, Rebecca; El-Hoshy, Lynn M.; Vizine-Goetz, Diane

2001-01-01

Discusses form subdivisions as part of Library of Congress Subject Headings (LCSH) and the MARC format, which did not have a separate subfield code to identify form subdivisions. Describes the development of an algorithm to identify form subdivisions and reports results of an evaluation of the algorithm. (LRW)
Riding the Hype Wave: Evaluating new AI Techniques for their Applicability in Earth Science

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Zhang, J.; Maskey, M.; Lee, T. J.

2016-12-01

Every few years a new technology rides the hype wave generated by the computer science community. Converts to this new technology who surface from both the science community and the informatics community promulgate that it can radically improve or even change the existing scientific process. Recent examples of new technology following in the footsteps of "big data" now include deep learning algorithms and knowledge graphs. Deep learning algorithms mimic the human brain and process information through multiple stages of transformation and representation. These algorithms are able to learn complex functions that map pixels directly to outputs without relying on human-crafted features and solve some of the complex classification problems that exist in science. Similarly, knowledge graphs aggregate information around defined topics that enable users to resolve their query without having to navigate and assemble information manually. Knowledge graphs could potentially be used in scientific research to assist in hypothesis formulation, testing, and review. The challenge for the Earth science research community is to evaluate these new technologies by asking the right questions and considering what-if scenarios. What is this new technology enabling/providing that is innovative and different? Can one justify the adoption costs with respect to the research returns? Since nothing comes for free, utilizing a new technology entails adoption costs that may outweigh the benefits. Furthermore, these technologies may require significant computing infrastructure in order to be utilized effectively. Results from two different projects will be presented along with lessons learned from testing these technologies. The first project primarily evaluates deep learning techniques for different applications of image retrieval within Earth science while the second project builds a prototype knowledge graph constructed for Hurricane science.
Prioritizing the Components of Vulnerability: A Genetic Algorithm Minimization of Flood Risk

NASA Astrophysics Data System (ADS)

Bongolan, Vena Pearl; Ballesteros, Florencio; Baritua, Karessa Alexandra; Junne Santos, Marie

2013-04-01

We define a flood resistant city as an optimal arrangement of communities according to their traits, with the goal of minimizing the flooding vulnerability via a genetic algorithm. We prioritize the different components of flooding vulnerability, giving each component a weight, thus expressing vulnerability as a weighted sum. This serves as the fitness function for the genetic algorithm. We also allowed non-linear interactions among related but independent components, viz, poverty and mortality rate, and literacy and radio/ tv penetration. The designs produced reflect the relative importance of the components, and we observed a synchronicity between the interacting components, giving us a more consistent design.
Observations on Student Misconceptions--A Case Study of the Build-Heap Algorithm

ERIC Educational Resources Information Center

Seppala, Otto; Malmi, Lauri; Korhonen, Ari

2006-01-01

Data structures and algorithms are core issues in computer programming. However, learning them is challenging for most students and many of them have various types of misconceptions on how algorithms work. In this study, we discuss the problem of identifying misconceptions on the principles of how algorithms work. Our context is algorithm…
Lossless compression of image data products on th e FIFE CD-ROM series

NASA Technical Reports Server (NTRS)

Newcomer, Jeffrey A.; Strebel, Donald E.

1993-01-01

How do you store enough of the key data sets, from a total of 120 gigabytes of data collected for a scientific experiment, on a collection of CD-ROM's, small enough to distribute to a broad scientific community? In such an application where information loss in unacceptable, lossless compression algorithms are the only choice. Although lossy compression algorithms can provide an order of magnitude improvement in compression ratios over lossless algorithms the information that is lost is often part of the key scientific precision of the data. Therefore, lossless compression algorithms are and will continue to be extremely important in minimizing archiving storage requirements and distribution of large earth and space (ESS) data sets while preserving the essential scientific precision of the data.
Automatic extraction of building boundaries using aerial LiDAR data

NASA Astrophysics Data System (ADS)

Wang, Ruisheng; Hu, Yong; Wu, Huayi; Wang, Jian

2016-01-01

Building extraction is one of the main research topics of the photogrammetry community. This paper presents automatic algorithms for building boundary extractions from aerial LiDAR data. First, segmenting height information generated from LiDAR data, the outer boundaries of aboveground objects are expressed as closed chains of oriented edge pixels. Then, building boundaries are distinguished from nonbuilding ones by evaluating their shapes. The candidate building boundaries are reconstructed as rectangles or regular polygons by applying new algorithms, following the hypothesis verification paradigm. These algorithms include constrained searching in Hough space, enhanced Hough transformation, and the sequential linking technique. The experimental results show that the proposed algorithms successfully extract building boundaries at rates of 97%, 85%, and 92% for three LiDAR datasets with varying scene complexities.
Exhaustive identification of steady state cycles in large stoichiometric networks

PubMed Central

Wright, Jeremiah; Wagner, Andreas

2008-01-01

Background Identifying cyclic pathways in chemical reaction networks is important, because such cycles may indicate in silico violation of energy conservation, or the existence of feedback in vivo. Unfortunately, our ability to identify cycles in stoichiometric networks, such as signal transduction and genome-scale metabolic networks, has been hampered by the computational complexity of the methods currently used. Results We describe a new algorithm for the identification of cycles in stoichiometric networks, and we compare its performance to two others by exhaustively identifying the cycles contained in the genome-scale metabolic networks of H. pylori, M. barkeri, E. coli, and S. cerevisiae. Our algorithm can substantially decrease both the execution time and maximum memory usage in comparison to the two previous algorithms. Conclusion The algorithm we describe improves our ability to study large, real-world, biochemical reaction networks, although additional methodological improvements are desirable. PMID:18616835
Optimal stabilization of Boolean networks through collective influence

NASA Astrophysics Data System (ADS)

Wang, Jiannan; Pei, Sen; Wei, Wei; Feng, Xiangnan; Zheng, Zhiming

2018-03-01

Boolean networks have attracted much attention due to their wide applications in describing dynamics of biological systems. During past decades, much effort has been invested in unveiling how network structure and update rules affect the stability of Boolean networks. In this paper, we aim to identify and control a minimal set of influential nodes that is capable of stabilizing an unstable Boolean network. For locally treelike Boolean networks with biased truth tables, we propose a greedy algorithm to identify influential nodes in Boolean networks by minimizing the largest eigenvalue of a modified nonbacktracking matrix. We test the performance of the proposed collective influence algorithm on four different networks. Results show that the collective influence algorithm can stabilize each network with a smaller set of nodes compared with other heuristic algorithms. Our work provides a new insight into the mechanism that determines the stability of Boolean networks, which may find applications in identifying virulence genes that lead to serious diseases.
New calibration algorithms for dielectric-based microwave moisture sensors

USDA-ARS?s Scientific Manuscript database

New calibration algorithms for determining moisture content in granular and particulate materials from measurement of the dielectric properties at a single microwave frequency are proposed. The algorithms are based on identifying empirically correlations between the dielectric properties and the par...
Automatic Image Registration of Multimodal Remotely Sensed Data with Global Shearlet Features

NASA Technical Reports Server (NTRS)

Murphy, James M.; Le Moigne, Jacqueline; Harding, David J.

2015-01-01

Automatic image registration is the process of aligning two or more images of approximately the same scene with minimal human assistance. Wavelet-based automatic registration methods are standard, but sometimes are not robust to the choice of initial conditions. That is, if the images to be registered are too far apart relative to the initial guess of the algorithm, the registration algorithm does not converge or has poor accuracy, and is thus not robust. These problems occur because wavelet techniques primarily identify isotropic textural features and are less effective at identifying linear and curvilinear edge features. We integrate the recently developed mathematical construction of shearlets, which is more effective at identifying sparse anisotropic edges, with an existing automatic wavelet-based registration algorithm. Our shearlet features algorithm produces more distinct features than wavelet features algorithms; the separation of edges from textures is even stronger than with wavelets. Our algorithm computes shearlet and wavelet features for the images to be registered, then performs least squares minimization on these features to compute a registration transformation. Our algorithm is two-staged and multiresolution in nature. First, a cascade of shearlet features is used to provide a robust, though approximate, registration. This is then refined by registering with a cascade of wavelet features. Experiments across a variety of image classes show an improved robustness to initial conditions, when compared to wavelet features alone.
Automatic Image Registration of Multi-Modal Remotely Sensed Data with Global Shearlet Features

PubMed Central

Murphy, James M.; Le Moigne, Jacqueline; Harding, David J.

2017-01-01

Automatic image registration is the process of aligning two or more images of approximately the same scene with minimal human assistance. Wavelet-based automatic registration methods are standard, but sometimes are not robust to the choice of initial conditions. That is, if the images to be registered are too far apart relative to the initial guess of the algorithm, the registration algorithm does not converge or has poor accuracy, and is thus not robust. These problems occur because wavelet techniques primarily identify isotropic textural features and are less effective at identifying linear and curvilinear edge features. We integrate the recently developed mathematical construction of shearlets, which is more effective at identifying sparse anisotropic edges, with an existing automatic wavelet-based registration algorithm. Our shearlet features algorithm produces more distinct features than wavelet features algorithms; the separation of edges from textures is even stronger than with wavelets. Our algorithm computes shearlet and wavelet features for the images to be registered, then performs least squares minimization on these features to compute a registration transformation. Our algorithm is two-staged and multiresolution in nature. First, a cascade of shearlet features is used to provide a robust, though approximate, registration. This is then refined by registering with a cascade of wavelet features. Experiments across a variety of image classes show an improved robustness to initial conditions, when compared to wavelet features alone. PMID:29123329
Algorithm for parametric community detection in networks.

PubMed

Bettinelli, Andrea; Hansen, Pierre; Liberti, Leo

2012-07-01

Modularity maximization is extensively used to detect communities in complex networks. It has been shown, however, that this method suffers from a resolution limit: Small communities may be undetectable in the presence of larger ones even if they are very dense. To alleviate this defect, various modifications of the modularity function have been proposed as well as multiresolution methods. In this paper we systematically study a simple model (proposed by Pons and Latapy [Theor. Comput. Sci. 412, 892 (2011)] and similar to the parametric model of Reichardt and Bornholdt [Phys. Rev. E 74, 016110 (2006)]) with a single parameter α that balances the fraction of within community edges and the expected fraction of edges according to the configuration model. An exact algorithm is proposed to find optimal solutions for all values of α as well as the corresponding successive intervals of α values for which they are optimal. This algorithm relies upon a routine for exact modularity maximization and is limited to moderate size instances. An agglomerative hierarchical heuristic is therefore proposed to address parametric modularity detection in large networks. At each iteration the smallest value of α for which it is worthwhile to merge two communities of the current partition is found. Then merging is performed and the data are updated accordingly. An implementation is proposed with the same time and space complexity as the well-known Clauset-Newman-Moore (CNM) heuristic [Phys. Rev. E 70, 066111 (2004)]. Experimental results on artificial and real world problems show that (i) communities are detected by both exact and heuristic methods for all values of the parameter α; (ii) the dendrogram summarizing the results of the heuristic method provides a useful tool for substantive analysis, as illustrated particularly on a Les Misérables data set; (iii) the difference between the parametric modularity values given by the exact method and those given by the heuristic is moderate; (iv) the heuristic version of the proposed parametric method, viewed as a modularity maximization tool, gives better results than the CNM heuristic for large instances.
On Establishing Big Data Wave Breakwaters with Analytics (Invited)

NASA Astrophysics Data System (ADS)

Riedel, M.

2013-12-01

The Research Data Alliance Big Data Analytics (RDA-BDA) Interest Group seeks to develop community based recommendations on feasible data analytics approaches to address scientific community needs of utilizing large quantities of data. RDA-BDA seeks to analyze different scientific domain applications and their potential use of various big data analytics techniques. A systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries will be covered in these recommendations. These combinations are complex since a wide variety of different data analysis algorithms exist (e.g. specific algorithms using GPUs of analyzing brain images) that need to work together with multiple analytical tools reaching from simple (iterative) map-reduce methods (e.g. with Apache Hadoop or Twister) to sophisticated higher level frameworks that leverage machine learning algorithms (e.g. Apache Mahout). These computational analysis techniques are often augmented with visual analytics techniques (e.g. computational steering on large-scale high performance computing platforms) to put the human judgement into the analysis loop or new approaches with databases that are designed to support new forms of unstructured or semi-structured data as opposed to the rather tradtional structural databases (e.g. relational databases). More recently, data analysis and underpinned analytics frameworks also have to consider energy footprints of underlying resources. To sum up, the aim of this talk is to provide pieces of information to understand big data analytics in the context of science and engineering using the aforementioned classification as the lighthouse and as the frame of reference for a systematic approach. This talk will provide insights about big data analytics methods in context of science within varios communities and offers different views of how approaches of correlation and causality offer complementary methods to advance in science and engineering today. The RDA Big Data Analytics Group seeks to understand what approaches are not only technically feasible, but also scientifically feasible. The lighthouse Goal of the RDA Big Data Analytics Group is a classification of clever combinations of various Technologies and scientific applications in order to provide clear recommendations to the scientific community what approaches are technicalla and scientifically feasible.
Geographically Modified PageRank Algorithms: Identifying the Spatial Concentration of Human Movement in a Geospatial Network.

PubMed

Chin, Wei-Chien-Benny; Wen, Tzai-Hung

2015-01-01

A network approach, which simplifies geographic settings as a form of nodes and links, emphasizes the connectivity and relationships of spatial features. Topological networks of spatial features are used to explore geographical connectivity and structures. The PageRank algorithm, a network metric, is often used to help identify important locations where people or automobiles concentrate in the geographical literature. However, geographic considerations, including proximity and location attractiveness, are ignored in most network metrics. The objective of the present study is to propose two geographically modified PageRank algorithms-Distance-Decay PageRank (DDPR) and Geographical PageRank (GPR)-that incorporate geographic considerations into PageRank algorithms to identify the spatial concentration of human movement in a geospatial network. Our findings indicate that in both intercity and within-city settings the proposed algorithms more effectively capture the spatial locations where people reside than traditional commonly-used network metrics. In comparing location attractiveness and distance decay, we conclude that the concentration of human movement is largely determined by the distance decay. This implies that geographic proximity remains a key factor in human mobility.
Evaluating the utility of syndromic surveillance algorithms for screening to detect potentially clonal hospital infection outbreaks

PubMed Central

Talbot, Thomas R; Schaffner, William; Bloch, Karen C; Daniels, Titus L; Miller, Randolph A

2011-01-01

Objective The authors evaluated algorithms commonly used in syndromic surveillance for use as screening tools to detect potentially clonal outbreaks for review by infection control practitioners. Design Study phase 1 applied four aberrancy detection algorithms (CUSUM, EWMA, space-time scan statistic, and WSARE) to retrospective microbiologic culture data, producing a list of past candidate outbreak clusters. In phase 2, four infectious disease physicians categorized the phase 1 algorithm-identified clusters to ascertain algorithm performance. In phase 3, project members combined the algorithms to create a unified screening system and conducted a retrospective pilot evaluation. Measurements The study calculated recall and precision for each algorithm, and created precision-recall curves for various methods of combining the algorithms into a unified screening tool. Results Individual algorithm recall and precision ranged from 0.21 to 0.31 and from 0.053 to 0.29, respectively. Few candidate outbreak clusters were identified by more than one algorithm. The best method of combining the algorithms yielded an area under the precision-recall curve of 0.553. The phase 3 combined system detected all infection control-confirmed outbreaks during the retrospective evaluation period. Limitations Lack of phase 2 reviewers' agreement indicates that subjective expert review was an imperfect gold standard. Less conservative filtering of culture results and alternate parameter selection for each algorithm might have improved algorithm performance. Conclusion Hospital outbreak detection presents different challenges than traditional syndromic surveillance. Nevertheless, algorithms developed for syndromic surveillance have potential to form the basis of a combined system that might perform clinically useful hospital outbreak screening. PMID:21606134
ChIP-PaM: an algorithm to identify protein-DNA interaction using ChIP-Seq data.

PubMed

Wu, Song; Wang, Jianmin; Zhao, Wei; Pounds, Stanley; Cheng, Cheng

2010-06-03

ChIP-Seq is a powerful tool for identifying the interaction between genomic regulators and their bound DNAs, especially for locating transcription factor binding sites. However, high cost and high rate of false discovery of transcription factor binding sites identified from ChIP-Seq data significantly limit its application. Here we report a new algorithm, ChIP-PaM, for identifying transcription factor target regions in ChIP-Seq datasets. This algorithm makes full use of a protein-DNA binding pattern by capitalizing on three lines of evidence: 1) the tag count modelling at the peak position, 2) pattern matching of a specific tag count distribution, and 3) motif searching along the genome. A novel data-based two-step eFDR procedure is proposed to integrate the three lines of evidence to determine significantly enriched regions. Our algorithm requires no technical controls and efficiently discriminates falsely enriched regions from regions enriched by true transcription factor (TF) binding on the basis of ChIP-Seq data only. An analysis of real genomic data is presented to demonstrate our method. In a comparison with other existing methods, we found that our algorithm provides more accurate binding site discovery while maintaining comparable statistical power.
A systematic review of validated methods for identifying erythema multiforme major/minor/not otherwise specified, Stevens-Johnson Syndrome, or toxic epidermal necrolysis using administrative and claims data.

PubMed

Schneider, Gary; Kachroo, Sumesh; Jones, Natalie; Crean, Sheila; Rotella, Philip; Avetisyan, Ruzan; Reynolds, Matthew W

2012-01-01

The Food and Drug Administration's (FDA) Mini-Sentinel pilot program aims to conduct active surveillance to refine safety signals that emerge for marketed medical products. A key facet of this surveillance is to develop and understand the validity of algorithms for identifying health outcomes of interest (HOIs) from administrative and claims data. This paper summarizes the process and findings of the algorithm review of erythema multiforme and related conditions. PubMed and Iowa Drug Information Service searches were conducted to identify citations applicable to the erythema multiforme HOI. Level 1 abstract reviews and Level 2 full-text reviews were conducted to find articles that used administrative and claims data to identify erythema multiforme, Stevens-Johnson syndrome, or toxic epidermal necrolysis and that included validation estimates of the coding algorithms. Our search revealed limited literature focusing on erythema multiforme and related conditions that provided administrative and claims data-based algorithms and validation estimates. Only four studies provided validated algorithms and all studies used the same International Classification of Diseases code, 695.1. Approximately half of cases subjected to expert review were consistent with erythema multiforme and related conditions. Updated research needs to be conducted on designing validation studies that test algorithms for erythema multiforme and related conditions and that take into account recent changes in the diagnostic coding of these diseases. Copyright © 2012 John Wiley & Sons, Ltd.
A brief dementia screener suitable for use by non-specialists in resource poor settings—the cross-cultural derivation and validation of the brief Community Screening Instrument for Dementia

PubMed Central

Prince, M; Acosta, D; Ferri, C P; Guerra, M; Huang, Y; Jacob, K S; Llibre Rodriguez, J J; Salas, A; Sosa, A L; Williams, J D; Hall, K S

2011-01-01

Objective Brief screening tools for dementia for use by non-specialists in primary care have yet to be validated in non-western settings where cultural factors and limited education may complicate the task. We aimed to derive a brief version of cognitive and informant scales from the Community Screening Instrument for Dementia (CSI-D) and to carry out initial assessments of their likely validity. Methods We applied Mokken analysis to CSI-D cognitive and informant scale data from 15 022 participants in representative population-based surveys in Latin America, India and China, to identify a subset of items from each that conformed optimally to item response theory scaling principles. The validity coefficients of the resulting brief scales (area under ROC curve, optimal cutpoint, sensitivity, specificity and Youden's index) were estimated from data collected in a previous cross-cultural validation of the full CSI-D. Results Seven cognitive items (Loevinger H coefficient 0.64) and six informant items (Loevinger H coefficient 0.69) were selected with excellent hierarchical scaling properties. For the brief cognitive scale, AUROC varied between 0.88 and 0.97, for the brief informant scale between 0.92 and 1.00, and for the combined algorithm between 0.94 and 1.00. Optimal cutpoints did not vary between regions. Youden's index for the combined algorithm varied between 0.78 and 1.00 by region. Conclusion A brief version of the full CSI-D appears to share the favourable culture- and education-fair screening properties of the full assessment, despite considerable abbreviation. The feasibility and validity of the brief version still needs to be established in routine primary care. Copyright © 2010 John Wiley & Sons, Ltd. PMID:21845592

Endmember identification from EO-1 Hyperion L1_R hyperspectral data to build saltmarsh spectral library in Hunter Wetland, NSW, Australia

NASA Astrophysics Data System (ADS)

Rasel, Sikdar M. M.; Chang, Hsing-Chung; Ralph, Tim; Saintilan, Neil

2015-10-01

Saltmarsh is one of the important communities of wetlands, however, due to a range of pressures, it has been declared as an EEC (Ecological Endangered Community) in Australia. In order to correctly identify different saltmarsh species, development of spectral libraries of saltmarsh species is essential to monitor this EEC. Hyperspectral remote sensing, can explore the area of wetland monitoring and mapping. The benefits of Hyperion data to wetland monitoring have been studied at Hunter Wetland Park, NSW, Australia. After exclusion of bad bands from the original data, an atmospheric correction model was applied to minimize atmospheric effect and to retrieve apparent surface reflectance for different land cover. Large data dimensionality was reduced by Forward Minimum Noise Fraction (MNF) algorithm. It was found that first 32 MNF band contains more than 80% information of the image. Pixel Purity Index (PPI) algorithm worked properly to extract pure pixel for water, builtup area and three vegetation Casuarina sp., Phragmitis sp. and green grass. The result showed it was challenging to extract extreme pure pixel for Sporobolus and Sarcocornia from the data due to coarse resolution (30 m) and small patch size (<3 m) of those vegetation on the ground . Spectral Angle Mapper, classified the image into five classes: Casuarina, Saltmarsh (Phragmitis), Green grass, Water and Builtup area with 43.55 % accuracy. This classification also failed to classify Sporobolus as a distinct group due to the same reason. A high spatial resolution airborne hyperspectral data and a new study site with a bigger patch of Sporobolus and Sarcocornia is proposed to overcome the issue.
A brief dementia screener suitable for use by non-specialists in resource poor settings--the cross-cultural derivation and validation of the brief Community Screening Instrument for Dementia.

PubMed

Prince, M; Acosta, D; Ferri, C P; Guerra, M; Huang, Y; Jacob, K S; Llibre Rodriguez, J J; Salas, A; Sosa, A L; Williams, J D; Hall, K S

2011-09-01

Brief screening tools for dementia for use by non-specialists in primary care have yet to be validated in non-western settings where cultural factors and limited education may complicate the task. We aimed to derive a brief version of cognitive and informant scales from the Community Screening Instrument for Dementia (CSI-D) and to carry out initial assessments of their likely validity. We applied Mokken analysis to CSI-D cognitive and informant scale data from 15 022 participants in representative population-based surveys in Latin America, India and China, to identify a subset of items from each that conformed optimally to item response theory scaling principles. The validity coefficients of the resulting brief scales (area under ROC curve, optimal cutpoint, sensitivity, specificity and Youden's index) were estimated from data collected in a previous cross-cultural validation of the full CSI-D. Seven cognitive items (Loevinger H coefficient 0.64) and six informant items (Loevinger H coefficient 0.69) were selected with excellent hierarchical scaling properties. For the brief cognitive scale, AUROC varied between 0.88 and 0.97, for the brief informant scale between 0.92 and 1.00, and for the combined algorithm between 0.94 and 1.00. Optimal cutpoints did not vary between regions. Youden's index for the combined algorithm varied between 0.78 and 1.00 by region. A brief version of the full CSI-D appears to share the favourable culture- and education-fair screening properties of the full assessment, despite considerable abbreviation. The feasibility and validity of the brief version still needs to be established in routine primary care. Copyright © 2010 John Wiley & Sons, Ltd.
Bounded-Degree Approximations of Stochastic Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quinn, Christopher J.; Pinar, Ali; Kiyavash, Negar

2017-06-01

We propose algorithms to approximate directed information graphs. Directed information graphs are probabilistic graphical models that depict causal dependencies between stochastic processes in a network. The proposed algorithms identify optimal and near-optimal approximations in terms of Kullback-Leibler divergence. The user-chosen sparsity trades off the quality of the approximation against visual conciseness and computational tractability. One class of approximations contains graphs with speci ed in-degrees. Another class additionally requires that the graph is connected. For both classes, we propose algorithms to identify the optimal approximations and also near-optimal approximations, using a novel relaxation of submodularity. We also propose algorithms to identifymore » the r-best approximations among these classes, enabling robust decision making.« less
Simulation-Based Evaluation of the Performances of an Algorithm for Detecting Abnormal Disease-Related Features in Cattle Mortality Records.

PubMed

Perrin, Jean-Baptiste; Durand, Benoît; Gay, Emilie; Ducrot, Christian; Hendrikx, Pascal; Calavas, Didier; Hénaux, Viviane

2015-01-01

We performed a simulation study to evaluate the performances of an anomaly detection algorithm considered in the frame of an automated surveillance system of cattle mortality. The method consisted in a combination of temporal regression and spatial cluster detection which allows identifying, for a given week, clusters of spatial units showing an excess of deaths in comparison with their own historical fluctuations. First, we simulated 1,000 outbreaks of a disease causing extra deaths in the French cattle population (about 200,000 herds and 20 million cattle) according to a model mimicking the spreading patterns of an infectious disease and injected these disease-related extra deaths in an authentic mortality dataset, spanning from January 2005 to January 2010. Second, we applied our algorithm on each of the 1,000 semi-synthetic datasets to identify clusters of spatial units showing an excess of deaths considering their own historical fluctuations. Third, we verified if the clusters identified by the algorithm did contain simulated extra deaths in order to evaluate the ability of the algorithm to identify unusual mortality clusters caused by an outbreak. Among the 1,000 simulations, the median duration of simulated outbreaks was 8 weeks, with a median number of 5,627 simulated deaths and 441 infected herds. Within the 12-week trial period, 73% of the simulated outbreaks were detected, with a median timeliness of 1 week, and a mean of 1.4 weeks. The proportion of outbreak weeks flagged by an alarm was 61% (i.e. sensitivity) whereas one in three alarms was a true alarm (i.e. positive predictive value). The performances of the detection algorithm were evaluated for alternative combination of epidemiologic parameters. The results of our study confirmed that in certain conditions automated algorithms could help identifying abnormal cattle mortality increases possibly related to unidentified health events.
Simulation-Based Evaluation of the Performances of an Algorithm for Detecting Abnormal Disease-Related Features in Cattle Mortality Records

PubMed Central

Perrin, Jean-Baptiste; Durand, Benoît; Gay, Emilie; Ducrot, Christian; Hendrikx, Pascal; Calavas, Didier; Hénaux, Viviane

2015-01-01

We performed a simulation study to evaluate the performances of an anomaly detection algorithm considered in the frame of an automated surveillance system of cattle mortality. The method consisted in a combination of temporal regression and spatial cluster detection which allows identifying, for a given week, clusters of spatial units showing an excess of deaths in comparison with their own historical fluctuations. First, we simulated 1,000 outbreaks of a disease causing extra deaths in the French cattle population (about 200,000 herds and 20 million cattle) according to a model mimicking the spreading patterns of an infectious disease and injected these disease-related extra deaths in an authentic mortality dataset, spanning from January 2005 to January 2010. Second, we applied our algorithm on each of the 1,000 semi-synthetic datasets to identify clusters of spatial units showing an excess of deaths considering their own historical fluctuations. Third, we verified if the clusters identified by the algorithm did contain simulated extra deaths in order to evaluate the ability of the algorithm to identify unusual mortality clusters caused by an outbreak. Among the 1,000 simulations, the median duration of simulated outbreaks was 8 weeks, with a median number of 5,627 simulated deaths and 441 infected herds. Within the 12-week trial period, 73% of the simulated outbreaks were detected, with a median timeliness of 1 week, and a mean of 1.4 weeks. The proportion of outbreak weeks flagged by an alarm was 61% (i.e. sensitivity) whereas one in three alarms was a true alarm (i.e. positive predictive value). The performances of the detection algorithm were evaluated for alternative combination of epidemiologic parameters. The results of our study confirmed that in certain conditions automated algorithms could help identifying abnormal cattle mortality increases possibly related to unidentified health events. PMID:26536596
Administrative Algorithms to identify Avascular necrosis of bone among patients undergoing upper or lower extremity magnetic resonance imaging: a validation study.

PubMed

Barbhaiya, Medha; Dong, Yan; Sparks, Jeffrey A; Losina, Elena; Costenbader, Karen H; Katz, Jeffrey N

2017-06-19

Studies of the epidemiology and outcomes of avascular necrosis (AVN) require accurate case-finding methods. The aim of this study was to evaluate performance characteristics of a claims-based algorithm designed to identify AVN cases in administrative data. Using a centralized patient registry from a US academic medical center, we identified all adults aged ≥18 years who underwent magnetic resonance imaging (MRI) of an upper/lower extremity joint during the 1.5 year study period. A radiologist report confirming AVN on MRI served as the gold standard. We examined the sensitivity, specificity, positive predictive value (PPV) and positive likelihood ratio (LR + ) of four algorithms (A-D) using International Classification of Diseases, 9th edition (ICD-9) codes for AVN. The algorithms ranged from least stringent (Algorithm A, requiring ≥1 ICD-9 code for AVN [733.4X]) to most stringent (Algorithm D, requiring ≥3 ICD-9 codes, each at least 30 days apart). Among 8200 patients who underwent MRI, 83 (1.0% [95% CI 0.78-1.22]) had AVN by gold standard. Algorithm A yielded the highest sensitivity (81.9%, 95% CI 72.0-89.5), with PPV of 66.0% (95% CI 56.0-75.1). The PPV of algorithm D increased to 82.2% (95% CI 67.9-92.0), although sensitivity decreased to 44.6% (95% CI 33.7-55.9). All four algorithms had specificities >99%. An algorithm that uses a single billing code to screen for AVN among those who had MRI has the highest sensitivity and is best suited for studies in which further medical record review confirming AVN is feasible. Algorithms using multiple billing codes are recommended for use in administrative databases when further AVN validation is not feasible.
Detecting Lung and Colorectal Cancer Recurrence Using Structured Clinical/Administrative Data to Enable Outcomes Research and Population Health Management.

PubMed

Hassett, Michael J; Uno, Hajime; Cronin, Angel M; Carroll, Nikki M; Hornbrook, Mark C; Ritzwoller, Debra

2017-12-01

Recurrent cancer is common, costly, and lethal, yet we know little about it in community-based populations. Electronic health records and tumor registries contain vast amounts of data regarding community-based patients, but usually lack recurrence status. Existing algorithms that use structured data to detect recurrence have limitations. We developed algorithms to detect the presence and timing of recurrence after definitive therapy for stages I-III lung and colorectal cancer using 2 data sources that contain a widely available type of structured data (claims or electronic health record encounters) linked to gold-standard recurrence status: Medicare claims linked to the Cancer Care Outcomes Research and Surveillance study, and the Cancer Research Network Virtual Data Warehouse linked to registry data. Twelve potential indicators of recurrence were used to develop separate models for each cancer in each data source. Detection models maximized area under the ROC curve (AUC); timing models minimized average absolute error. Algorithms were compared by cancer type/data source, and contrasted with an existing binary detection rule. Detection model AUCs (>0.92) exceeded existing prediction rules. Timing models yielded absolute prediction errors that were small relative to follow-up time (<15%). Similar covariates were included in all detection and timing algorithms, though differences by cancer type and dataset challenged efforts to create 1 common algorithm for all scenarios. Valid and reliable detection of recurrence using big data is feasible. These tools will enable extensive, novel research on quality, effectiveness, and outcomes for lung and colorectal cancer patients and those who develop recurrence.
TRMM Version 7 Near-Realtime Data Products

NASA Technical Reports Server (NTRS)

Tocker, Erich Franz; Kelley, Owen

2012-01-01

The TRMM data system has been providing near-realtime data products to the community since late 1999. While the TRMM project never had near-realtime production requirements, the science and applications communities had a great interest in receiving TRMM data as quickly as possible. As a result these NRT data are provided under a best-effort scenario but with the objective of having the swath data products available within three hours of data collection 90% of the time. In July of 2011 the Joint Precipitation Measurement Missions Science Team (JPST) authorized the reprocessing of TRMM mission data using the new version 7 algorithms. The reprocessing of the 14+ years of the mission was concluded within 30 days. Version 7 algorithms had substantial changes in the data product file formats both for data and metadata. In addition, the algorithms themselves had major modifications and improvements. The general approach to versioning up the NRT is to wait for the regular production algorithms to have run for a while and shake out any issues that might arise from the new version before updating the NRT products. Because of the substantial changes in data/metadata formats as well as the algorithm improvements themselves, the update of NRT to V7 followed an even more conservative path than usual. This was done to ensure that applications agencies and other users of the TRMM NRT would not be faces with short-timeframes for conversion to the new format. This paper will describe the process by which the TRMM NRT was updated to V7 and the V7 data products themselves.
Electronic Detection of Delayed Test Result Follow-Up in Patients with Hypothyroidism.

PubMed

Meyer, Ashley N D; Murphy, Daniel R; Al-Mutairi, Aymer; Sittig, Dean F; Wei, Li; Russo, Elise; Singh, Hardeep

2017-07-01

Delays in following up abnormal test results are a common problem in outpatient settings. Surveillance systems that use trigger tools to identify delayed follow-up can help reduce missed opportunities in care. To develop and test an electronic health record (EHR)-based trigger algorithm to identify instances of delayed follow-up of abnormal thyroid-stimulating hormone (TSH) results in patients being treated for hypothyroidism. We developed an algorithm using structured EHR data to identify patients with hypothyroidism who had delayed follow-up (>60 days) after an abnormal TSH. We then retrospectively applied the algorithm to a large EHR data warehouse within the Department of Veterans Affairs (VA), on patient records from two large VA networks for the period from January 1, 2011, to December 31, 2011. Identified records were reviewed to confirm the presence of delays in follow-up. During the study period, 645,555 patients were seen in the outpatient setting within the two networks. Of 293,554 patients with at least one TSH test result, the trigger identified 1250 patients on treatment for hypothyroidism with elevated TSH. Of these patients, 271 were flagged as potentially having delayed follow-up of their test result. Chart reviews confirmed delays in 163 of the 271 flagged patients (PPV = 60.1%). An automated trigger algorithm applied to records in a large EHR data warehouse identified patients with hypothyroidism with potential delays in thyroid function test results follow-up. Future prospective application of the TSH trigger algorithm can be used by clinical teams as a surveillance and quality improvement technique to monitor and improve follow-up.
Applicability of an established management algorithm for destructive colon injuries after abbreviated laparotomy: a 17-year experience.

PubMed

Sharpe, John P; Magnotti, Louis J; Weinberg, Jordan A; Shahan, Charles P; Cullinan, Darren R; Marino, Katy A; Fabian, Timothy C; Croce, Martin A

2014-04-01

For more than a decade, operative decisions (resection plus anastomosis vs diversion) for colon injuries, at our institution, have followed a defined management algorithm based on established risk factors (pre- or intraoperative transfusion requirements of more than 6 units packed RBCs and/or presence of significant comorbid diseases). However, this management algorithm was originally developed for patients managed with a single laparotomy. The purpose of this study was to evaluate the applicability of this algorithm to destructive colon injuries after abbreviated laparotomy (AL) and to determine whether additional risk factors should be considered. Consecutive patients over a 17-year period with colon injuries after AL were identified. Nondestructive injuries were managed with primary repair. Destructive wounds were resected at the initial laparotomy followed by either a staged diversion (SD) or a delayed anastomosis (DA) at the subsequent exploration. Outcomes were evaluated to identify additional risk factors in the setting of AL. We identified 149 patients: 33 (22%) patients underwent primary repair at initial exploration, 42 (28%) underwent DA, and 72 (49%) had SD. Two (1%) patients died before re-exploration. Of those undergoing DA, 23 (55%) patients were managed according to the algorithm and 19 (45%) were not. Adherence to the algorithm resulted in lower rates of suture line failure (4% vs 32%, p = 0.03) and colon-related morbidity (22% vs 58%, p = 0.03) for patients undergoing DA. No additional specific risk factors for suture line failure after DA were identified. Adherence to an established algorithm, originally defined for destructive colon injuries after single laparotomy, is likewise efficacious for the management of these injuries in the setting of AL. Copyright © 2014 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Development and validation of a novel algorithm based on the ECG magnet response for rapid identification of any unknown pacemaker.

PubMed

Squara, Fabien; Chik, William W; Benhayon, Daniel; Maeda, Shingo; Latcu, Decebal Gabriel; Lacaze-Gadonneix, Jonathan; Tibi, Thierry; Thomas, Olivier; Cooper, Joshua M; Duthoit, Guillaume

2014-08-01

Pacemaker (PM) interrogation requires correct manufacturer identification. However, an unidentified PM is a frequent occurrence, requiring time-consuming steps to identify the device. The purpose of this study was to develop and validate a novel algorithm for PM manufacturer identification, using the ECG response to magnet application. Data on the magnet responses of all recent PM models (≤15 years) from the 5 major manufacturers were collected. An algorithm based on the ECG response to magnet application to identify the PM manufacturer was subsequently developed. Patients undergoing ECG during magnet application in various clinical situations were prospectively recruited in 7 centers. The algorithm was applied in the analysis of every ECG by a cardiologist blinded to PM information. A second blinded cardiologist analyzed a sample of randomly selected ECGs in order to assess the reproducibility of the results. A total of 250 ECGs were analyzed during magnet application. The algorithm led to the correct single manufacturer choice in 242 ECGs (96.8%), whereas 7 (2.8%) could only be narrowed to either 1 of 2 manufacturer possibilities. Only 2 (0.4%) incorrect manufacturer identifications occurred. The algorithm identified Medtronic and Sorin Group PMs with 100% sensitivity and specificity, Biotronik PMs with 100% sensitivity and 99.5% specificity, and St. Jude and Boston Scientific PMs with 92% sensitivity and 100% specificity. The results were reproducible between the 2 blinded cardiologists with 92% concordant findings. Unknown PM manufacturers can be accurately identified by analyzing the ECG magnet response using this newly developed algorithm. Copyright © 2014 Heart Rhythm Society. Published by Elsevier Inc. All rights reserved.
Validation of an automated electronic algorithm and "dashboard" to identify and characterize decompensated heart failure admissions across a medical center.

PubMed

Cox, Zachary L; Lewis, Connie M; Lai, Pikki; Lenihan, Daniel J

2017-01-01

We aim to validate the diagnostic performance of the first fully automatic, electronic heart failure (HF) identification algorithm and evaluate the implementation of an HF Dashboard system with 2 components: real-time identification of decompensated HF admissions and accurate characterization of disease characteristics and medical therapy. We constructed an HF identification algorithm requiring 3 of 4 identifiers: B-type natriuretic peptide >400 pg/mL; admitting HF diagnosis; history of HF International Classification of Disease, Ninth Revision, diagnosis codes; and intravenous diuretic administration. We validated the diagnostic accuracy of the components individually (n = 366) and combined in the HF algorithm (n = 150) compared with a blinded provider panel in 2 separate cohorts. We built an HF Dashboard within the electronic medical record characterizing the disease and medical therapies of HF admissions identified by the HF algorithm. We evaluated the HF Dashboard's performance over 26 months of clinical use. Individually, the algorithm components displayed variable sensitivity and specificity, respectively: B-type natriuretic peptide >400 pg/mL (89% and 87%); diuretic (80% and 92%); and International Classification of Disease, Ninth Revision, code (56% and 95%). The HF algorithm achieved a high specificity (95%), positive predictive value (82%), and negative predictive value (85%) but achieved limited sensitivity (56%) secondary to missing provider-generated identification data. The HF Dashboard identified and characterized 3147 HF admissions over 26 months. Automated identification and characterization systems can be developed and used with a substantial degree of specificity for the diagnosis of decompensated HF, although sensitivity is limited by clinical data input. Copyright © 2016 Elsevier Inc. All rights reserved.
Constructing financial network based on PMFG and threshold method

NASA Astrophysics Data System (ADS)

Nie, Chun-Xiao; Song, Fu-Tie

2018-04-01

Based on planar maximally filtered graph (PMFG) and threshold method, we introduced a correlation-based network named PMFG-based threshold network (PTN). We studied the community structure of PTN and applied ISOMAP algorithm to represent PTN in low-dimensional Euclidean space. The results show that the community corresponds well to the cluster in the Euclidean space. Further, we studied the dynamics of the community structure and constructed the normalized mutual information (NMI) matrix. Based on the real data in the market, we found that the volatility of the market can lead to dramatic changes in the community structure, and the structure is more stable during the financial crisis.
Microbial Biogeography of Public Restroom Surfaces

PubMed Central

Flores, Gilberto E.; Bates, Scott T.; Knights, Dan; Lauber, Christian L.; Stombaugh, Jesse; Knight, Rob; Fierer, Noah

2011-01-01

We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, the diversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibited by bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing of the 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla: Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: those found on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched with hands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floor surfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associated bacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were more common in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in female restrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomic observations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate that restroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clear linkages between communities on or in different body sites and those communities found on restroom surfaces. More generally, this work is relevant to the public health field as we show that human-associated microbes are commonly found on restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touching of surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determine sources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test the efficacy of hygiene practices. PMID:22132229
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

DTIC Science & Technology

2015-03-16

moderately-sized networks. As a consequence, throughout this effort, a simulated annealing (SA) algorithm will be employed to effectively search the...then increment k by 1 and repeat the search to find z∗3. Once can continue to increment k until W < zδ, at which point the algorithm will stop and...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources
Implementation of the ground level enhancement alert software at NMDB database

NASA Astrophysics Data System (ADS)

Mavromichalaki, Helen; Souvatzoglou, George; Sarlanis, Christos; Mariatos, George; Papaioannou, Athanasios; Belov, Anatoly; Eroshenko, Eugenia; Yanke, Victor; NMDB Team

2010-11-01

The European Commission is supporting the real-time database for high-resolution neutron monitor measurements (NMDB) as an e-Infrastructures project in the Seventh Framework Programme in the Capacities section. The realization of the NMDB will provide the opportunity for several applications most of which will be implemented in real-time. An important application will be the establishment of an Alert signal when dangerous solar particle events are heading to the Earth, resulting into a ground level enhancement (GLE) registered by neutron monitors (NMs). The cosmic ray community has been occupied with the question of establishing such an Alert for many years and recently several groups succeeded in creating a proper algorithm capable of detecting space weather threats in an off-line mode. A lot of original work has been done to this direction and every group working in this field performed routine runs for all GLE cases, resulting into statistical analyses of GLE events. The next step was to make this algorithm as accurate as possible and most importantly, working in real-time. This was achieved when, during the last GLE observed so far, a real-time GLE Alert signal was produced. In this work, the steps of this procedure as well as the functionality of this algorithm for both the scientific community and users are being discussed. Nevertheless, the transition of the Alert algorithm to the NMDB is also being discussed.
Automatic image analysis and spot classification for detection of fruit fly infestation in hyperspectral images of mangoes

USDA-ARS?s Scientific Manuscript database

An algorithm has been developed to identify spots generated in hyperspectral images of mangoes infested with fruit fly larvae. The algorithm incorporates background removal, application of a Gaussian blur, thresholding, and particle count analysis to identify locations of infestations. Each of the f...
A parallel algorithm for the eigenvalues and eigenvectors for a general complex matrix

NASA Technical Reports Server (NTRS)

Shroff, Gautam

1989-01-01

A new parallel Jacobi-like algorithm is developed for computing the eigenvalues of a general complex matrix. Most parallel methods for this parallel typically display only linear convergence. Sequential norm-reducing algorithms also exit and they display quadratic convergence in most cases. The new algorithm is a parallel form of the norm-reducing algorithm due to Eberlein. It is proven that the asymptotic convergence rate of this algorithm is quadratic. Numerical experiments are presented which demonstrate the quadratic convergence of the algorithm and certain situations where the convergence is slow are also identified. The algorithm promises to be very competitive on a variety of parallel architectures.
Algorithm for AEEG data selection leading to wireless and long term epilepsy monitoring.

PubMed

Casson, Alexander J; Yates, David C; Patel, Shyam; Rodriguez-Villegas, Esther

2007-01-01

High quality, wireless ambulatory EEG (AEEG) systems that can operate over extended periods of time are not currently feasible due to the high power consumption of wireless transmitters. Previous work has thus proposed data reduction by only transmitting sections of data that contain candidate epileptic activity. This paper investigates algorithms by which this data selection can be carried out. It is essential that the algorithm is low power and that all possible features are identified, even at the expense of more false detections. Given this, a brief review of spike detection algorithms is carried out with a view to using these algorithms to drive the data reduction process. A CWT based algorithm is deemed most suitable for use and an algorithm is described in detail and its performance tested. It is found that over 90% of expert marked spikes are identified whilst giving a 40% reduction in the amount of data to be transmitted and analysed. The performance varies with the recording duration in response to each detection and this effect is also investigated. The proposed algorithm will form the basis of new a AEEG system that allows wireless and longer term epilepsy monitoring.
A semi-supervised classification algorithm using the TAD-derived background as training data

NASA Astrophysics Data System (ADS)

Fan, Lei; Ambeau, Brittany; Messinger, David W.

2013-05-01

In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.