Sample records for network inference algorithm

  1. Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm

    PubMed Central

    Zheng, Ming; Wu, Jia-nan; Huang, Yan-xin; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

    2012-01-01

    Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms. PMID:23226565

  2. An algebra-based method for inferring gene regulatory networks.

    PubMed

    Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard

    2014-03-26

    The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html.

  3. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    PubMed

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.

  4. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

    PubMed Central

    2014-01-01

    Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926

  5. An algebra-based method for inferring gene regulatory networks

    PubMed Central

    2014-01-01

    Background The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. Results This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Conclusions Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html. PMID:24669835

  6. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    PubMed

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

  7. Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks

    PubMed Central

    Yamanaka, Ryota; Kitano, Hiroaki

    2013-01-01

    Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007

  8. A prior-based integrative framework for functional transcriptional regulatory network inference

    PubMed Central

    Siahpirani, Alireza F.

    2017-01-01

    Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550

  9. NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

    PubMed Central

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482

  10. Inference of scale-free networks from gene expression time series.

    PubMed

    Daisuke, Tominaga; Horton, Paul

    2006-04-01

    Quantitative time-series observation of gene expression is becoming possible, for example by cell array technology. However, there are no practical methods with which to infer network structures using only observed time-series data. As most computational models of biological networks for continuous time-series data have a high degree of freedom, it is almost impossible to infer the correct structures. On the other hand, it has been reported that some kinds of biological networks, such as gene networks and metabolic pathways, may have scale-free properties. We hypothesize that the architecture of inferred biological network models can be restricted to scale-free networks. We developed an inference algorithm for biological networks using only time-series data by introducing such a restriction. We adopt the S-system as the network model, and a distributed genetic algorithm to optimize models to fit its simulated results to observed time series data. We have tested our algorithm on a case study (simulated data). We compared optimization under no restriction, which allows for a fully connected network, and under the restriction that the total number of links must equal that expected from a scale free network. The restriction reduced both false positive and false negative estimation of the links and also the differences between model simulation and the given time-series data.

  11. Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation.

    PubMed

    Guo, Xiaobo; Zhang, Ye; Hu, Wenhao; Tan, Haizhu; Wang, Xueqin

    2014-01-01

    Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.

  12. Inferring Nonlinear Gene Regulatory Networks from Gene Expression Data Based on Distance Correlation

    PubMed Central

    Guo, Xiaobo; Zhang, Ye; Hu, Wenhao; Tan, Haizhu; Wang, Xueqin

    2014-01-01

    Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference. PMID:24551058

  13. Recurrent neural network-based modeling of gene regulatory network using elephant swarm water search algorithm.

    PubMed

    Mandal, Sudip; Saha, Goutam; Pal, Rajat Kumar

    2017-08-01

    Correct inference of genetic regulations inside a cell from the biological database like time series microarray data is one of the greatest challenges in post genomic era for biologists and researchers. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. Inspired by the behavior of social elephants, we propose a new metaheuristic namely Elephant Swarm Water Search Algorithm (ESWSA) to infer Gene Regulatory Network (GRN). This algorithm is mainly based on the water search strategy of intelligent and social elephants during drought, utilizing the different types of communication techniques. Initially, the algorithm is tested against benchmark small and medium scale artificial genetic networks without and with presence of different noise levels and the efficiency was observed in term of parametric error, minimum fitness value, execution time, accuracy of prediction of true regulation, etc. Next, the proposed algorithm is tested against the real time gene expression data of Escherichia Coli SOS Network and results were also compared with others state of the art optimization methods. The experimental results suggest that ESWSA is very efficient for GRN inference problem and performs better than other methods in many ways.

  14. A novel gene network inference algorithm using predictive minimum description length approach.

    PubMed

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the PMDL principle is effective in determining the MI threshold and the developed algorithm improves precision of gene regulatory network inference. Based on the sensitivity analysis of all tested cases, an optimal CMI threshold value has been identified. Finally it was observed that the performance of the algorithms saturates at a certain threshold of data size.

  15. Spiking neuron network Helmholtz machine.

    PubMed

    Sountsov, Pavel; Miller, Paul

    2015-01-01

    An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule.

  16. Spiking neuron network Helmholtz machine

    PubMed Central

    Sountsov, Pavel; Miller, Paul

    2015-01-01

    An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule. PMID:25954191

  17. A Prize-Collecting Steiner Tree Approach for Transduction Network Inference

    NASA Astrophysics Data System (ADS)

    Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo

    Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.

  18. Personalized recommendation via unbalance full-connectivity inference

    NASA Astrophysics Data System (ADS)

    Ma, Wenping; Ren, Chen; Wu, Yue; Wang, Shanfeng; Feng, Xiang

    2017-10-01

    Recommender systems play an important role to help us to find useful information. They are widely used by most e-commerce web sites to push the potential items to individual user according to purchase history. Network-based recommendation algorithms are popular and effective in recommendation, which use two types of elements to represent users and items respectively. In this paper, based on consistence-based inference (CBI) algorithm, we propose a novel network-based algorithm, in which users and items are recognized with no difference. The proposed algorithm also uses information diffusion to find the relationship between users and items. Different from traditional network-based recommendation algorithms, information diffusion initializes from users and items, respectively. Experiments show that the proposed algorithm is effective compared with traditional network-based recommendation algorithms.

  19. The architecture of adaptive neural network based on a fuzzy inference system for implementing intelligent control in photovoltaic systems

    NASA Astrophysics Data System (ADS)

    Gimazov, R.; Shidlovskiy, S.

    2018-05-01

    In this paper, we consider the architecture of the algorithm for extreme regulation in the photovoltaic system. An algorithm based on an adaptive neural network with fuzzy inference is proposed. The implementation of such an algorithm not only allows solving a number of problems in existing algorithms for extreme power regulation of photovoltaic systems, but also creates a reserve for the creation of a universal control system for a photovoltaic system.

  20. Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference.

    PubMed

    Morcos, Faruck; Lamanna, Charles; Sikora, Marcin; Izaguirre, Jesús

    2008-10-01

    Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. http://cytoprophet.cse.nd.edu.

  1. Transcriptional network inference from functional similarity and expression data: a global supervised approach.

    PubMed

    Ambroise, Jérôme; Robert, Annie; Macq, Benoit; Gala, Jean-Luc

    2012-01-06

    An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised classifier. In the current work, TNIFSED was applied to predict the transcriptional network in Escherichia coli and in Saccharomyces cerevisiae, using datasets of 445 and 170 affymetrix arrays, respectively. Using the area under the curve of the receiver operating characteristics and the F-measure as indicators, we showed the predictive performance of TNIFSED to be better than unsupervised state-of-the-art methods. TNIFSED performed slightly worse than the supervised SIRENE algorithm for the target genes identification of the TF having a wide range of yet identified target genes but better for TF having only few identified target genes. Our results indicate that TNIFSED is complementary to the SIRENE algorithm, and particularly suitable to discover target genes of "orphan" TFs.

  2. Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data.

    PubMed

    Tian, Bo; Duan, Qiong; Zhao, Can; Teng, Ben; He, Zengyou

    2017-05-17

    Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.

  3. Causal Inference and Explaining Away in a Spiking Network

    PubMed Central

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-01-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426

  4. Causal Inference and Explaining Away in a Spiking Network.

    PubMed

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-12-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification.

  5. Inferring Boolean network states from partial information

    PubMed Central

    2013-01-01

    Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data. PMID:24006954

  6. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

    PubMed

    Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

  7. Construction of regulatory networks using expression time-series data of a genotyped population.

    PubMed

    Yeung, Ka Yee; Dombek, Kenneth M; Lo, Kenneth; Mittler, John E; Zhu, Jun; Schadt, Eric E; Bumgarner, Roger E; Raftery, Adrian E

    2011-11-29

    The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene-gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.

  8. Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy

    PubMed Central

    Liu, Wei; Zhu, Wen; Liao, Bo; Chen, Xiangtao

    2016-01-01

    Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method. PMID:27829000

  9. Successful Reconstruction of a Physiological Circuit with Known Connectivity from Spiking Activity Alone

    PubMed Central

    Gerhard, Felipe; Kispersky, Tilman; Gutierrez, Gabrielle J.; Marder, Eve; Kramer, Mark; Eden, Uri

    2013-01-01

    Identifying the structure and dynamics of synaptic interactions between neurons is the first step to understanding neural network dynamics. The presence of synaptic connections is traditionally inferred through the use of targeted stimulation and paired recordings or by post-hoc histology. More recently, causal network inference algorithms have been proposed to deduce connectivity directly from electrophysiological signals, such as extracellularly recorded spiking activity. Usually, these algorithms have not been validated on a neurophysiological data set for which the actual circuitry is known. Recent work has shown that traditional network inference algorithms based on linear models typically fail to identify the correct coupling of a small central pattern generating circuit in the stomatogastric ganglion of the crab Cancer borealis. In this work, we show that point process models of observed spike trains can guide inference of relative connectivity estimates that match the known physiological connectivity of the central pattern generator up to a choice of threshold. We elucidate the necessary steps to derive faithful connectivity estimates from a model that incorporates the spike train nature of the data. We then apply the model to measure changes in the effective connectivity pattern in response to two pharmacological interventions, which affect both intrinsic neural dynamics and synaptic transmission. Our results provide the first successful application of a network inference algorithm to a circuit for which the actual physiological synapses between neurons are known. The point process methodology presented here generalizes well to larger networks and can describe the statistics of neural populations. In general we show that advanced statistical models allow for the characterization of effective network structure, deciphering underlying network dynamics and estimating information-processing capabilities. PMID:23874181

  10. Predictive minimum description length principle approach to inferring gene regulatory networks.

    PubMed

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  11. Inverse Ising problem in continuous time: A latent variable approach

    NASA Astrophysics Data System (ADS)

    Donner, Christian; Opper, Manfred

    2017-12-01

    We consider the inverse Ising problem: the inference of network couplings from observed spin trajectories for a model with continuous time Glauber dynamics. By introducing two sets of auxiliary latent random variables we render the likelihood into a form which allows for simple iterative inference algorithms with analytical updates. The variables are (1) Poisson variables to linearize an exponential term which is typical for point process likelihoods and (2) Pólya-Gamma variables, which make the likelihood quadratic in the coupling parameters. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimate of network parameters. Using a third set of latent variables we extend the EM algorithm to sparse couplings via L1 regularization. Finally, we develop an efficient approximate Bayesian inference algorithm using a variational approach. We demonstrate the performance of our algorithms on data simulated from an Ising model. For data which are simulated from a more biologically plausible network with spiking neurons, we show that the Ising model captures well the low order statistics of the data and how the Ising couplings are related to the underlying synaptic structure of the simulated network.

  12. Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks

    NASA Astrophysics Data System (ADS)

    Sun, Wei; Chang, K. C.

    2005-05-01

    Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.

  13. F-MAP: A Bayesian approach to infer the gene regulatory network using external hints

    PubMed Central

    Shahdoust, Maryam; Mahjub, Hossein; Sadeghi, Mehdi

    2017-01-01

    The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches. PMID:28938012

  14. Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data

    PubMed Central

    Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo

    2012-01-01

    In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642

  15. Learning control of inverted pendulum system by neural network driven fuzzy reasoning: The learning function of NN-driven fuzzy reasoning under changes of reasoning environment

    NASA Technical Reports Server (NTRS)

    Hayashi, Isao; Nomura, Hiroyoshi; Wakami, Noboru

    1991-01-01

    Whereas conventional fuzzy reasonings are associated with tuning problems, which are lack of membership functions and inference rule designs, a neural network driven fuzzy reasoning (NDF) capable of determining membership functions by neural network is formulated. In the antecedent parts of the neural network driven fuzzy reasoning, the optimum membership function is determined by a neural network, while in the consequent parts, an amount of control for each rule is determined by other plural neural networks. By introducing an algorithm of neural network driven fuzzy reasoning, inference rules for making a pendulum stand up from its lowest suspended point are determined for verifying the usefulness of the algorithm.

  16. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    PubMed

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  17. Semi-Supervised Multi-View Learning for Gene Network Reconstruction

    PubMed Central

    Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo

    2015-01-01

    The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091

  18. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    PubMed

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  19. CaSPIAN: A Causal Compressive Sensing Algorithm for Discovering Directed Interactions in Gene Networks

    PubMed Central

    Emad, Amin; Milenkovic, Olgica

    2014-01-01

    We introduce a novel algorithm for inference of causal gene interactions, termed CaSPIAN (Causal Subspace Pursuit for Inference and Analysis of Networks), which is based on coupling compressive sensing and Granger causality techniques. The core of the approach is to discover sparse linear dependencies between shifted time series of gene expressions using a sequential list-version of the subspace pursuit reconstruction algorithm and to estimate the direction of gene interactions via Granger-type elimination. The method is conceptually simple and computationally efficient, and it allows for dealing with noisy measurements. Its performance as a stand-alone platform without biological side-information was tested on simulated networks, on the synthetic IRMA network in Saccharomyces cerevisiae, and on data pertaining to the human HeLa cell network and the SOS network in E. coli. The results produced by CaSPIAN are compared to the results of several related algorithms, demonstrating significant improvements in inference accuracy of documented interactions. These findings highlight the importance of Granger causality techniques for reducing the number of false-positives, as well as the influence of noise and sampling period on the accuracy of the estimates. In addition, the performance of the method was tested in conjunction with biological side information of the form of sparse “scaffold networks”, to which new edges were added using available RNA-seq or microarray data. These biological priors aid in increasing the sensitivity and precision of the algorithm in the small sample regime. PMID:24622336

  20. Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks

    PubMed Central

    Wang, Yi Kan; Hurley, Daniel G.; Schnell, Santiago; Print, Cristin G.; Crampin, Edmund J.

    2013-01-01

    We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data. PMID:23967277

  1. Personalized recommendation via an improved NBI algorithm and user influence model in a Microblog network

    NASA Astrophysics Data System (ADS)

    Lian, Jie; Liu, Yun; Zhang, Zhen-jiang; Gui, Chang-ni

    2013-10-01

    Bipartite network based recommendations have attracted extensive attentions in recent years. Differing from traditional object-oriented recommendations, the recommendation in a Microblog network has two crucial differences. One is high authority users or one’s special friends usually play a very active role in tweet-oriented recommendation. The other is that the object in a Microblog network corresponds to a set of tweets on same topic instead of an actual and single entity, e.g. goods or movies in traditional networks. Thus repeat recommendations of the tweets in one’s collected topics are indispensable. Therefore, this paper improves network based inference (NBI) algorithm by original link matrix and link weight on resource allocation processes. This paper finally proposes the Microblog recommendation model based on the factors of improved network based inference and user influence model. Adjusting the weights of these two factors could generate the best recommendation results in algorithm accuracy and recommendation personalization.

  2. Network Inference via the Time-Varying Graphical Lasso

    PubMed Central

    Hallac, David; Park, Youngsuk; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics of such data, it is essential to understand the relationships between the different entities and how these relationships evolve over time. In this paper, we introduce the time-varying graphical lasso (TVGL), a method of inferring time-varying networks from raw time series data. We cast the problem in terms of estimating a sparse time-varying inverse covariance matrix, which reveals a dynamic network of interdependencies between the entities. Since dynamic network inference is a computationally expensive task, we derive a scalable message-passing algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in an efficient way. We also discuss several extensions, including a streaming algorithm to update the model and incorporate new observations in real time. Finally, we evaluate our TVGL algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming state-of-the-art baselines in terms of both accuracy and scalability. PMID:29770256

  3. Gene-network inference by message passing

    NASA Astrophysics Data System (ADS)

    Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.

    2008-01-01

    The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.

  4. Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

    1998-01-01

    Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

  5. Wisdom of crowds for robust gene network inference

    PubMed Central

    Marbach, Daniel; Costello, James C.; Küffner, Robert; Vega, Nicci; Prill, Robert J.; Camacho, Diogo M.; Allison, Kyle R.; Kellis, Manolis; Collins, James J.; Stolovitzky, Gustavo

    2012-01-01

    Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks. PMID:22796662

  6. A reconsideration of negative ratings for network-based recommendation

    NASA Astrophysics Data System (ADS)

    Hu, Liang; Ren, Liang; Lin, Wenbin

    2018-01-01

    Recommendation algorithms based on bipartite networks have become increasingly popular, thanks to their accuracy and flexibility. Currently, many of these methods ignore users' negative ratings. In this work, we propose a method to exploit negative ratings for the network-based inference algorithm. We find that negative ratings play a positive role regardless of sparsity of data sets. Furthermore, we improve the efficiency of our method and compare it with the state-of-the-art algorithms. Experimental results show that the present method outperforms the existing algorithms.

  7. Inference of neuronal network spike dynamics and topology from calcium imaging data

    PubMed Central

    Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof

    2013-01-01

    Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence (“spike trains”) from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties. PMID:24399936

  8. Automatic inference of multicellular regulatory networks using informative priors.

    PubMed

    Sun, Xiaoyun; Hong, Pengyu

    2009-01-01

    To fully understand the mechanisms governing animal development, computational models and algorithms are needed to enable quantitative studies of the underlying regulatory networks. We developed a mathematical model based on dynamic Bayesian networks to model multicellular regulatory networks that govern cell differentiation processes. A machine-learning method was developed to automatically infer such a model from heterogeneous data. We show that the model inference procedure can be greatly improved by incorporating interaction data across species. The proposed approach was applied to C. elegans vulval induction to reconstruct a model capable of simulating C. elegans vulval induction under 73 different genetic conditions.

  9. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data.

    PubMed

    Chen, Shuonan; Mar, Jessica C

    2018-06-19

    A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.

  10. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method

    PubMed Central

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-01-01

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs. PMID:29113310

  11. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    PubMed

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  12. Discovering time-lagged rules from microarray data using gene profile classifiers

    PubMed Central

    2011-01-01

    Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. PMID:21524308

  13. Boolean network inference from time series data incorporating prior biological knowledge.

    PubMed

    Haider, Saad; Pal, Ranadip

    2012-01-01

    Numerous approaches exist for modeling of genetic regulatory networks (GRNs) but the low sampling rates often employed in biological studies prevents the inference of detailed models from experimental data. In this paper, we analyze the issues involved in estimating a model of a GRN from single cell line time series data with limited time points. We present an inference approach for a Boolean Network (BN) model of a GRN from limited transcriptomic or proteomic time series data based on prior biological knowledge of connectivity, constraints on attractor structure and robust design. We applied our inference approach to 6 time point transcriptomic data on Human Mammary Epithelial Cell line (HMEC) after application of Epidermal Growth Factor (EGF) and generated a BN with a plausible biological structure satisfying the data. We further defined and applied a similarity measure to compare synthetic BNs and BNs generated through the proposed approach constructed from transitions of various paths of the synthetic BNs. We have also compared the performance of our algorithm with two existing BN inference algorithms. Through theoretical analysis and simulations, we showed the rarity of arriving at a BN from limited time series data with plausible biological structure using random connectivity and absence of structure in data. The framework when applied to experimental data and data generated from synthetic BNs were able to estimate BNs with high similarity scores. Comparison with existing BN inference algorithms showed the better performance of our proposed algorithm for limited time series data. The proposed framework can also be applied to optimize the connectivity of a GRN from experimental data when the prior biological knowledge on regulators is limited or not unique.

  14. An inference method from multi-layered structure of biomedical data.

    PubMed

    Kim, Myungjun; Nam, Yonghyun; Shin, Hyunjung

    2017-05-18

    Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels. To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer. The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results. This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.

  15. Inferring phenomenological models of Markov processes from data

    NASA Astrophysics Data System (ADS)

    Rivera, Catalina; Nemenman, Ilya

    Microscopically accurate modeling of stochastic dynamics of biochemical networks is hard due to the extremely high dimensionality of the state space of such networks. Here we propose an algorithm for inference of phenomenological, coarse-grained models of Markov processes describing the network dynamics directly from data, without the intermediate step of microscopically accurate modeling. The approach relies on the linear nature of the Chemical Master Equation and uses Bayesian Model Selection for identification of parsimonious models that fit the data. When applied to synthetic data from the Kinetic Proofreading process (KPR), a common mechanism used by cells for increasing specificity of molecular assembly, the algorithm successfully uncovers the known coarse-grained description of the process. This phenomenological description has been notice previously, but this time it is derived in an automated manner by the algorithm. James S. McDonnell Foundation Grant No. 220020321.

  16. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  17. A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

    PubMed Central

    Yin, Junming; Ho, Qirong; Xing, Eric P.

    2014-01-01

    We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. PMID:25400487

  18. An Optimal Algorithm towards Successive Location Privacy in Sensor Networks with Dynamic Programming

    NASA Astrophysics Data System (ADS)

    Zhao, Baokang; Wang, Dan; Shao, Zili; Cao, Jiannong; Chan, Keith C. C.; Su, Jinshu

    In wireless sensor networks, preserving location privacy under successive inference attacks is extremely critical. Although this problem is NP-complete in general cases, we propose a dynamic programming based algorithm and prove it is optimal in special cases where the correlation only exists between p immediate adjacent observations.

  19. Inference of gene regulatory networks from time series by Tsallis entropy

    PubMed Central

    2011-01-01

    Background The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/. PMID:21545720

  20. Mass Conservation and Inference of Metabolic Networks from High-Throughput Mass Spectrometry Data

    PubMed Central

    Bandaru, Pradeep; Bansal, Mukesh

    2011-01-01

    Abstract We present a step towards the metabolome-wide computational inference of cellular metabolic reaction networks from metabolic profiling data, such as mass spectrometry. The reconstruction is based on identification of irreducible statistical interactions among the metabolite activities using the ARACNE reverse-engineering algorithm and on constraining possible metabolic transformations to satisfy the conservation of mass. The resulting algorithms are validated on synthetic data from an abridged computational model of Escherichia coli metabolism. Precision rates upwards of 50% are routinely observed for identification of full metabolic reactions, and recalls upwards of 20% are also seen. PMID:21314454

  1. Variable neighborhood search for reverse engineering of gene regulatory networks.

    PubMed

    Nicholson, Charles; Goodwin, Leslie; Clark, Corey

    2017-01-01

    A new search heuristic, Divided Neighborhood Exploration Search, designed to be used with inference algorithms such as Bayesian networks to improve on the reverse engineering of gene regulatory networks is presented. The approach systematically moves through the search space to find topologies representative of gene regulatory networks that are more likely to explain microarray data. In empirical testing it is demonstrated that the novel method is superior to the widely employed greedy search techniques in both the quality of the inferred networks and computational time. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Order priors for Bayesian network discovery with an application to malware phylogeny

    DOE PAGES

    Oyen, Diane; Anderson, Blake; Sentz, Kari; ...

    2017-09-15

    Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less

  3. Order priors for Bayesian network discovery with an application to malware phylogeny

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oyen, Diane; Anderson, Blake; Sentz, Kari

    Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less

  4. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

    PubMed

    Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

    2015-09-29

    In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.

  5. Inferring genetic interactions via a nonlinear model and an optimization algorithm.

    PubMed

    Chen, Chung-Ming; Lee, Chih; Chuang, Cheng-Long; Wang, Chia-Chang; Shieh, Grace S

    2010-02-26

    Biochemical pathways are gradually becoming recognized as central to complex human diseases and recently genetic/transcriptional interactions have been shown to be able to predict partial pathways. With the abundant information made available by microarray gene expression data (MGED), nonlinear modeling of these interactions is now feasible. Two of the latest advances in nonlinear modeling used sigmoid models to depict transcriptional interaction of a transcription factor (TF) for a target gene, but do not model cooperative or competitive interactions of several TFs for a target. An S-shape model and an optimization algorithm (GASA) were developed to infer genetic interactions/transcriptional regulation of several genes simultaneously using MGED. GASA consists of a genetic algorithm (GA) and a simulated annealing (SA) algorithm, which is enhanced by a steepest gradient descent algorithm to avoid being trapped in local minimum. Using simulated data with various degrees of noise, we studied how GASA with two model selection criteria and two search spaces performed. Furthermore, GASA was shown to outperform network component analysis, the time series network inference algorithm (TSNI), GA with regular GA (GAGA) and GA with regular SA. Two applications are demonstrated. First, GASA is applied to infer a subnetwork of human T-cell apoptosis. Several of the predicted interactions are supported by the literature. Second, GASA was applied to infer the transcriptional factors of 34 cell cycle regulated targets in S. cerevisiae, and GASA performed better than one of the latest advances in nonlinear modeling, GAGA and TSNI. Moreover, GASA is able to predict multiple transcription factors for certain targets, and these results coincide with experiments confirmed data in YEASTRACT. GASA is shown to infer both genetic interactions and transcriptional regulatory interactions well. In particular, GASA seems able to characterize the nonlinear mechanism of transcriptional regulatory interactions (TIs) in yeast, and may be applied to infer TIs in other organisms. The predicted genetic interactions of a subnetwork of human T-cell apoptosis coincide with existing partial pathways, suggesting the potential of GASA on inferring biochemical pathways.

  6. Statistical inference approach to structural reconstruction of complex networks from binary time series

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  7. Statistical inference approach to structural reconstruction of complex networks from binary time series.

    PubMed

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  8. Prediction and Control of Network Cascade: Example of Power Grid or Networking Adaptability from WMD Disruption and Cascading Failures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chertkov, Michael

    2012-07-24

    The goal of the DTRA project is to develop a mathematical framework that will provide the fundamental understanding of network survivability, algorithms for detecting/inferring pre-cursors of abnormal network behaviors, and methods for network adaptability and self-healing from cascading failures.

  9. PREMER: a Tool to Infer Biological Networks.

    PubMed

    Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R

    2017-10-04

    Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).

  10. A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations

    NASA Astrophysics Data System (ADS)

    He, Hao; Varshney, Pramod K.

    2016-04-01

    We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.

  11. Inference of gene regulatory networks from genome-wide knockout fitness data

    PubMed Central

    Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

    2013-01-01

    Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online PMID:23271269

  12. GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML.

    PubMed

    Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas

    2013-09-01

    GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.

  13. Inferring anatomical therapeutic chemical (ATC) class of drugs using shortest path and random walk with restart algorithms.

    PubMed

    Chen, Lei; Liu, Tao; Zhao, Xian

    2018-06-01

    The anatomical therapeutic chemical (ATC) classification system is a widely accepted drug classification scheme. This system comprises five levels and includes several classes in each level. Drugs are classified into classes according to their therapeutic effects and characteristics. The first level includes 14 main classes. In this study, we proposed two network-based models to infer novel potential chemicals deemed to belong in the first level of ATC classification. To build these models, two large chemical networks were constructed using the chemical-chemical interaction information retrieved from the Search Tool for Interactions of Chemicals (STITCH). Two classic network algorithms, shortest path (SP) and random walk with restart (RWR) algorithms, were executed on the corresponding network to mine novel chemicals for each ATC class using the validated drugs in a class as seed nodes. Then, the obtained chemicals yielded by these two algorithms were further evaluated by a permutation test and an association test. The former can exclude chemicals produced by the structure of the network, i.e., false positive discoveries. By contrast, the latter identifies the most important chemicals that have strong associations with the ATC class. Comparisons indicated that the two models can provide quite dissimilar results, suggesting that the results yielded by one model can be essential supplements for those obtained by the other model. In addition, several representative inferred chemicals were analyzed to confirm the reliability of the results generated by the two models. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Applying dynamic Bayesian networks to perturbed gene expression data.

    PubMed

    Dojer, Norbert; Gambin, Anna; Mizera, Andrzej; Wilczyński, Bartek; Tiuryn, Jerzy

    2006-05-08

    A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.

  15. Generating probabilistic Boolean networks from a prescribed transition probability matrix.

    PubMed

    Ching, W-K; Chen, X; Tsing, N-K

    2009-11-01

    Probabilistic Boolean networks (PBNs) have received much attention in modeling genetic regulatory networks. A PBN can be regarded as a Markov chain process and is characterised by a transition probability matrix. In this study, the authors propose efficient algorithms for constructing a PBN when its transition probability matrix is given. The complexities of the algorithms are also analysed. This is an interesting inverse problem in network inference using steady-state data. The problem is important as most microarray data sets are assumed to be obtained from sampling the steady-state.

  16. Causal inference in biology networks with integrated belief propagation.

    PubMed

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  17. Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

    PubMed

    Fan, Yue; Wang, Xiao; Peng, Qinke

    2017-01-01

    Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

  18. Linear time-varying models can reveal non-linear interactions of biomolecular regulatory networks using multiple time-series data.

    PubMed

    Kim, Jongrae; Bates, Declan G; Postlethwaite, Ian; Heslop-Harrison, Pat; Cho, Kwang-Hyun

    2008-05-15

    Inherent non-linearities in biomolecular interactions make the identification of network interactions difficult. One of the principal problems is that all methods based on the use of linear time-invariant models will have fundamental limitations in their capability to infer certain non-linear network interactions. Another difficulty is the multiplicity of possible solutions, since, for a given dataset, there may be many different possible networks which generate the same time-series expression profiles. A novel algorithm for the inference of biomolecular interaction networks from temporal expression data is presented. Linear time-varying models, which can represent a much wider class of time-series data than linear time-invariant models, are employed in the algorithm. From time-series expression profiles, the model parameters are identified by solving a non-linear optimization problem. In order to systematically reduce the set of possible solutions for the optimization problem, a filtering process is performed using a phase-portrait analysis with random numerical perturbations. The proposed approach has the advantages of not requiring the system to be in a stable steady state, of using time-series profiles which have been generated by a single experiment, and of allowing non-linear network interactions to be identified. The ability of the proposed algorithm to correctly infer network interactions is illustrated by its application to three examples: a non-linear model for cAMP oscillations in Dictyostelium discoideum, the cell-cycle data for Saccharomyces cerevisiae and a large-scale non-linear model of a group of synchronized Dictyostelium cells. The software used in this article is available from http://sbie.kaist.ac.kr/software

  19. A Novel Adjustment Method for Shearer Traction Speed through Integration of T-S Cloud Inference Network and Improved PSO

    PubMed Central

    Si, Lei; Wang, Zhongbin; Yang, Yinwei

    2014-01-01

    In order to efficiently and accurately adjust the shearer traction speed, a novel approach based on Takagi-Sugeno (T-S) cloud inference network (CIN) and improved particle swarm optimization (IPSO) is proposed. The T-S CIN is built through the combination of cloud model and T-S fuzzy neural network. Moreover, the IPSO algorithm employs parameter automation adjustment strategy and velocity resetting to significantly improve the performance of basic PSO algorithm in global search and fine-tuning of the solutions, and the flowchart of proposed approach is designed. Furthermore, some simulation examples are carried out and comparison results indicate that the proposed method is feasible, efficient, and is outperforming others. Finally, an industrial application example of coal mining face is demonstrated to specify the effect of proposed system. PMID:25506358

  20. Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.

    PubMed

    Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong

    2017-07-03

    Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.

  1. Enhancing gene regulatory network inference through data integration with markov random fields

    DOE PAGES

    Banf, Michael; Rhee, Seung Y.

    2017-02-01

    Here, a gene regulatory network links transcription factors to their target genes and represents a map of transcriptional regulation. Much progress has been made in deciphering gene regulatory networks computationally. However, gene regulatory network inference for most eukaryotic organisms remain challenging. To improve the accuracy of gene regulatory network inference and facilitate candidate selection for experimentation, we developed an algorithm called GRACE (Gene Regulatory network inference ACcuracy Enhancement). GRACE exploits biological a priori and heterogeneous data integration to generate high- confidence network predictions for eukaryotic organisms using Markov Random Fields in a semi-supervised fashion. GRACE uses a novel optimization schememore » to integrate regulatory evidence and biological relevance. It is particularly suited for model learning with sparse regulatory gold standard data. We show GRACE’s potential to produce high confidence regulatory networks compared to state of the art approaches using Drosophila melanogaster and Arabidopsis thaliana data. In an A. thaliana developmental gene regulatory network, GRACE recovers cell cycle related regulatory mechanisms and further hypothesizes several novel regulatory links, including a putative control mechanism of vascular structure formation due to modifications in cell proliferation.« less

  2. Enhancing gene regulatory network inference through data integration with markov random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Banf, Michael; Rhee, Seung Y.

    Here, a gene regulatory network links transcription factors to their target genes and represents a map of transcriptional regulation. Much progress has been made in deciphering gene regulatory networks computationally. However, gene regulatory network inference for most eukaryotic organisms remain challenging. To improve the accuracy of gene regulatory network inference and facilitate candidate selection for experimentation, we developed an algorithm called GRACE (Gene Regulatory network inference ACcuracy Enhancement). GRACE exploits biological a priori and heterogeneous data integration to generate high- confidence network predictions for eukaryotic organisms using Markov Random Fields in a semi-supervised fashion. GRACE uses a novel optimization schememore » to integrate regulatory evidence and biological relevance. It is particularly suited for model learning with sparse regulatory gold standard data. We show GRACE’s potential to produce high confidence regulatory networks compared to state of the art approaches using Drosophila melanogaster and Arabidopsis thaliana data. In an A. thaliana developmental gene regulatory network, GRACE recovers cell cycle related regulatory mechanisms and further hypothesizes several novel regulatory links, including a putative control mechanism of vascular structure formation due to modifications in cell proliferation.« less

  3. Inferring microbial interaction networks from metagenomic data using SgLV-EKF algorithm.

    PubMed

    Alshawaqfeh, Mustafa; Serpedin, Erchin; Younes, Ahmad Bani

    2017-03-27

    Inferring the microbial interaction networks (MINs) and modeling their dynamics are critical in understanding the mechanisms of the bacterial ecosystem and designing antibiotic and/or probiotic therapies. Recently, several approaches were proposed to infer MINs using the generalized Lotka-Volterra (gLV) model. Main drawbacks of these models include the fact that these models only consider the measurement noise without taking into consideration the uncertainties in the underlying dynamics. Furthermore, inferring the MIN is characterized by the limited number of observations and nonlinearity in the regulatory mechanisms. Therefore, novel estimation techniques are needed to address these challenges. This work proposes SgLV-EKF: a stochastic gLV model that adopts the extended Kalman filter (EKF) algorithm to model the MIN dynamics. In particular, SgLV-EKF employs a stochastic modeling of the MIN by adding a noise term to the dynamical model to compensate for modeling uncertainties. This stochastic modeling is more realistic than the conventional gLV model which assumes that the MIN dynamics are perfectly governed by the gLV equations. After specifying the stochastic model structure, we propose the EKF to estimate the MIN. SgLV-EKF was compared with two similarity-based algorithms, one algorithm from the integral-based family and two regression-based algorithms, in terms of the achieved performance on two synthetic data-sets and two real data-sets. The first data-set models the randomness in measurement data, whereas, the second data-set incorporates uncertainties in the underlying dynamics. The real data-sets are provided by a recent study pertaining to an antibiotic-mediated Clostridium difficile infection. The experimental results demonstrate that SgLV-EKF outperforms the alternative methods in terms of robustness to measurement noise, modeling errors, and tracking the dynamics of the MIN. Performance analysis demonstrates that the proposed SgLV-EKF algorithm represents a powerful and reliable tool to infer MINs and track their dynamics.

  4. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    DOE PAGES

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

  5. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles

    PubMed Central

    Thaden, Joshua T; Mogno, Ilaria; Wierzbowski, Jamey; Cottarel, Guillaume; Kasif, Simon; Collins, James J; Gardner, Timothy S

    2007-01-01

    Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36% relative to the next-best performing algorithm. At a 60% true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data. PMID:17214507

  6. INfORM: Inference of NetwOrk Response Modules.

    PubMed

    Marwah, Veer Singh; Kinaret, Pia Anneli Sofia; Serra, Angela; Scala, Giovanni; Lauerma, Antti; Fortino, Vittorio; Greco, Dario

    2018-06-15

    Detecting and interpreting responsive modules from gene expression data by using network-based approaches is a common but laborious task. It often requires the application of several computational methods implemented in different software packages, forcing biologists to compile complex analytical pipelines. Here we introduce INfORM (Inference of NetwOrk Response Modules), an R shiny application that enables non-expert users to detect, evaluate and select gene modules with high statistical and biological significance. INfORM is a comprehensive tool for the identification of biologically meaningful response modules from consensus gene networks inferred by using multiple algorithms. It is accessible through an intuitive graphical user interface allowing for a level of abstraction from the computational steps. INfORM is freely available for academic use at https://github.com/Greco-Lab/INfORM. Supplementary data are available at Bioinformatics online.

  7. Modeling gene regulatory networks: A network simplification algorithm

    NASA Astrophysics Data System (ADS)

    Ferreira, Luiz Henrique O.; de Castro, Maria Clicia S.; da Silva, Fabricio A. B.

    2016-12-01

    Boolean networks have been used for some time to model Gene Regulatory Networks (GRNs), which describe cell functions. Those models can help biologists to make predictions, prognosis and even specialized treatment when some disturb on the GRN lead to a sick condition. However, the amount of information related to a GRN can be huge, making the task of inferring its boolean network representation quite a challenge. The method shown here takes into account information about the interactome to build a network, where each node represents a protein, and uses the entropy of each node as a key to reduce the size of the network, allowing the further inferring process to focus only on the main protein hubs, the ones with most potential to interfere in overall network behavior.

  8. Algorithm Optimally Orders Forward-Chaining Inference Rules

    NASA Technical Reports Server (NTRS)

    James, Mark

    2008-01-01

    People typically develop knowledge bases in a somewhat ad hoc manner by incrementally adding rules with no specific organization. This often results in a very inefficient execution of those rules since they are so often order sensitive. This is relevant to tasks like Deep Space Network in that it allows the knowledge base to be incrementally developed and have it automatically ordered for efficiency. Although data flow analysis was first developed for use in compilers for producing optimal code sequences, its usefulness is now recognized in many software systems including knowledge-based systems. However, this approach for exhaustively computing data-flow information cannot directly be applied to inference systems because of the ubiquitous execution of the rules. An algorithm is presented that efficiently performs a complete producer/consumer analysis for each antecedent and consequence clause in a knowledge base to optimally order the rules to minimize inference cycles. An algorithm was developed that optimally orders a knowledge base composed of forwarding chaining inference rules such that independent inference cycle executions are minimized, thus, resulting in significantly faster execution. This algorithm was integrated into the JPL tool Spacecraft Health Inference Engine (SHINE) for verification and it resulted in a significant reduction in inference cycles for what was previously considered an ordered knowledge base. For a knowledge base that is completely unordered, then the improvement is much greater.

  9. Constructing an integrated gene similarity network for the identification of disease genes.

    PubMed

    Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin

    2017-09-20

    Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .

  10. Scalable Probabilistic Inference for Global Seismic Monitoring

    NASA Astrophysics Data System (ADS)

    Arora, N. S.; Dear, T.; Russell, S.

    2011-12-01

    We describe a probabilistic generative model for seismic events, their transmission through the earth, and their detection (or mis-detection) at seismic stations. We also describe an inference algorithm that constructs the most probable event bulletin explaining the observed set of detections. The model and inference are called NET-VISA (network processing vertically integrated seismic analysis) and is designed to replace the current automated network processing at the IDC, the SEL3 bulletin. Our results (attached table) demonstrate that NET-VISA significantly outperforms SEL3 by reducing the missed events from 30.3% down to 12.5%. The difference is even more dramatic for smaller magnitude events. NET-VISA has no difficulty in locating nuclear explosions as well. The attached figure demonstrates the location predicted by NET-VISA versus other bulletins for the second DPRK event. Further evaluation on dense regional networks demonstrates that NET-VISA finds many events missed in the LEB bulletin, which is produced by the human analysts. Large aftershock sequences, as produced by the 2004 December Sumatra earthquake and the 2011 March Tohoku earthquake, can pose a significant load for automated processing, often delaying the IDC bulletins by weeks or months. Indeed these sequences can overload the serial NET-VISA inference as well. We describe an enhancement to NET-VISA to make it multi-threaded, and hence take full advantage of the processing power of multi-core and -cpu machines. Our experiments show that the new inference algorithm is able to achieve 80% efficiency in parallel speedup.

  11. CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data

    PubMed Central

    Weiss, Scott T.

    2014-01-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com. PMID:24922310

  12. CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data.

    PubMed

    McGeachie, Michael J; Chang, Hsun-Hsien; Weiss, Scott T

    2014-06-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.

  13. Visual recognition and inference using dynamic overcomplete sparse learning.

    PubMed

    Murray, Joseph F; Kreutz-Delgado, Kenneth

    2007-09-01

    We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

  14. Griffin: A Tool for Symbolic Inference of Synchronous Boolean Molecular Networks.

    PubMed

    Muñoz, Stalin; Carrillo, Miguel; Azpeitia, Eugenio; Rosenblueth, David A

    2018-01-01

    Boolean networks are important models of biochemical systems, located at the high end of the abstraction spectrum. A number of Boolean gene networks have been inferred following essentially the same method. Such a method first considers experimental data for a typically underdetermined "regulation" graph. Next, Boolean networks are inferred by using biological constraints to narrow the search space, such as a desired set of (fixed-point or cyclic) attractors. We describe Griffin , a computer tool enhancing this method. Griffin incorporates a number of well-established algorithms, such as Dubrova and Teslenko's algorithm for finding attractors in synchronous Boolean networks. In addition, a formal definition of regulation allows Griffin to employ "symbolic" techniques, able to represent both large sets of network states and Boolean constraints. We observe that when the set of attractors is required to be an exact set, prohibiting additional attractors, a naive Boolean coding of this constraint may be unfeasible. Such cases may be intractable even with symbolic methods, as the number of Boolean constraints may be astronomically large. To overcome this problem, we employ an Artificial Intelligence technique known as "clause learning" considerably increasing Griffin 's scalability. Without clause learning only toy examples prohibiting additional attractors are solvable: only one out of seven queries reported here is answered. With clause learning, by contrast, all seven queries are answered. We illustrate Griffin with three case studies drawn from the Arabidopsis thaliana literature. Griffin is available at: http://turing.iimas.unam.mx/griffin.

  15. Evaluation of a new neutron energy spectrum unfolding code based on an Adaptive Neuro-Fuzzy Inference System (ANFIS).

    PubMed

    Hosseini, Seyed Abolfazl; Esmaili Paeen Afrakoti, Iman

    2018-01-17

    The purpose of the present study was to reconstruct the energy spectrum of a poly-energetic neutron source using an algorithm developed based on an Adaptive Neuro-Fuzzy Inference System (ANFIS). ANFIS is a kind of artificial neural network based on the Takagi-Sugeno fuzzy inference system. The ANFIS algorithm uses the advantages of both fuzzy inference systems and artificial neural networks to improve the effectiveness of algorithms in various applications such as modeling, control and classification. The neutron pulse height distributions used as input data in the training procedure for the ANFIS algorithm were obtained from the simulations performed by MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). Taking into account the normalization condition of each energy spectrum, 4300 neutron energy spectra were generated randomly. (The value in each bin was generated randomly, and finally a normalization of each generated energy spectrum was performed). The randomly generated neutron energy spectra were considered as output data of the developed ANFIS computational code in the training step. To calculate the neutron energy spectrum using conventional methods, an inverse problem with an approximately singular response matrix (with the determinant of the matrix close to zero) should be solved. The solution of the inverse problem using the conventional methods unfold neutron energy spectrum with low accuracy. Application of the iterative algorithms in the solution of such a problem, or utilizing the intelligent algorithms (in which there is no need to solve the problem), is usually preferred for unfolding of the energy spectrum. Therefore, the main reason for development of intelligent algorithms like ANFIS for unfolding of neutron energy spectra is to avoid solving the inverse problem. In the present study, the unfolded neutron energy spectra of 252Cf and 241Am-9Be neutron sources using the developed computational code were found to have excellent agreement with the reference data. Also, the unfolded energy spectra of the neutron sources as obtained using ANFIS were more accurate than the results reported from calculations performed using artificial neural networks in previously published papers. © The Author(s) 2018. Published by Oxford University Press on behalf of The Japan Radiation Research Society and Japanese Society for Radiation Oncology.

  16. Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.

    PubMed

    Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido

    2013-04-01

    Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.

  17. MORE: mixed optimization for reverse engineering--an application to modeling biological networks response via sparse systems of nonlinear differential equations.

    PubMed

    Sambo, Francesco; de Oca, Marco A Montes; Di Camillo, Barbara; Toffolo, Gianna; Stützle, Thomas

    2012-01-01

    Reverse engineering is the problem of inferring the structure of a network of interactions between biological variables from a set of observations. In this paper, we propose an optimization algorithm, called MORE, for the reverse engineering of biological networks from time series data. The model inferred by MORE is a sparse system of nonlinear differential equations, complex enough to realistically describe the dynamics of a biological system. MORE tackles separately the discrete component of the problem, the determination of the biological network topology, and the continuous component of the problem, the strength of the interactions. This approach allows us both to enforce system sparsity, by globally constraining the number of edges, and to integrate a priori information about the structure of the underlying interaction network. Experimental results on simulated and real-world networks show that the mixed discrete/continuous optimization approach of MORE significantly outperforms standard continuous optimization and that MORE is competitive with the state of the art in terms of accuracy of the inferred networks.

  18. Graphical models for optimal power flow

    DOE PAGES

    Dvijotham, Krishnamurthy; Chertkov, Michael; Van Hentenryck, Pascal; ...

    2016-09-13

    Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithmmore » for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary tree-structured distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for “smart grid” applications like control of distributed energy resources. In conclusion, numerical evaluations on several benchmark networks show that practical OPF problems can be solved effectively using this approach.« less

  19. Functional Inference of Complex Anatomical Tendinous Networks at a Macroscopic Scale via Sparse Experimentation

    PubMed Central

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J.

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16th century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines. PMID:23144601

  20. Functional inference of complex anatomical tendinous networks at a macroscopic scale via sparse experimentation.

    PubMed

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16(th) century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines.

  1. A recurrent self-organizing neural fuzzy inference network.

    PubMed

    Juang, C F; Lin, C T

    1999-01-01

    A recurrent self-organizing neural fuzzy inference network (RSONFIN) is proposed in this paper. The RSONFIN is inherently a recurrent multilayered connectionist network for realizing the basic elements and functions of dynamic fuzzy inference, and may be considered to be constructed from a series of dynamic fuzzy rules. The temporal relations embedded in the network are built by adding some feedback connections representing the memory elements to a feedforward neural fuzzy network. Each weight as well as node in the RSONFIN has its own meaning and represents a special element in a fuzzy rule. There are no hidden nodes (i.e., no membership functions and fuzzy rules) initially in the RSONFIN. They are created on-line via concurrent structure identification (the construction of dynamic fuzzy if-then rules) and parameter identification (the tuning of the free parameters of membership functions). The structure learning together with the parameter learning forms a fast learning algorithm for building a small, yet powerful, dynamic neural fuzzy network. Two major characteristics of the RSONFIN can thus be seen: 1) the recurrent property of the RSONFIN makes it suitable for dealing with temporal problems and 2) no predetermination, like the number of hidden nodes, must be given, since the RSONFIN can find its optimal structure and parameters automatically and quickly. Moreover, to reduce the number of fuzzy rules generated, a flexible input partition method, the aligned clustering-based algorithm, is proposed. Various simulations on temporal problems are done and performance comparisons with some existing recurrent networks are also made. Efficiency of the RSONFIN is verified from these results.

  2. Evidence reasoning method for constructing conditional probability tables in a Bayesian network of multimorbidity.

    PubMed

    Du, Yuanwei; Guo, Yubin

    2015-01-01

    The intrinsic mechanism of multimorbidity is difficult to recognize and prediction and diagnosis are difficult to carry out accordingly. Bayesian networks can help to diagnose multimorbidity in health care, but it is difficult to obtain the conditional probability table (CPT) because of the lack of clinically statistical data. Today, expert knowledge and experience are increasingly used in training Bayesian networks in order to help predict or diagnose diseases, but the CPT in Bayesian networks is usually irrational or ineffective for ignoring realistic constraints especially in multimorbidity. In order to solve these problems, an evidence reasoning (ER) approach is employed to extract and fuse inference data from experts using a belief distribution and recursive ER algorithm, based on which evidence reasoning method for constructing conditional probability tables in Bayesian network of multimorbidity is presented step by step. A multimorbidity numerical example is used to demonstrate the method and prove its feasibility and application. Bayesian network can be determined as long as the inference assessment is inferred by each expert according to his/her knowledge or experience. Our method is more effective than existing methods for extracting expert inference data accurately and is fused effectively for constructing CPTs in a Bayesian network of multimorbidity.

  3. Functional Alignment of Metabolic Networks.

    PubMed

    Mazza, Arnon; Wagner, Allon; Ruppin, Eytan; Sharan, Roded

    2016-05-01

    Network alignment has become a standard tool in comparative biology, allowing the inference of protein function, interaction, and orthology. However, current alignment techniques are based on topological properties of networks and do not take into account their functional implications. Here we propose, for the first time, an algorithm to align two metabolic networks by taking advantage of their coupled metabolic models. These models allow us to assess the functional implications of genes or reactions, captured by the metabolic fluxes that are altered following their deletion from the network. Such implications may spread far beyond the region of the network where the gene or reaction lies. We apply our algorithm to align metabolic networks from various organisms, ranging from bacteria to humans, showing that our alignment can reveal functional orthology relations that are missed by conventional topological alignments.

  4. On the inherent competition between valid and spurious inductive inferences in Boolean data

    NASA Astrophysics Data System (ADS)

    Andrecut, M.

    Inductive inference is the process of extracting general rules from specific observations. This problem also arises in the analysis of biological networks, such as genetic regulatory networks, where the interactions are complex and the observations are incomplete. A typical task in these problems is to extract general interaction rules as combinations of Boolean covariates, that explain a measured response variable. The inductive inference process can be considered as an incompletely specified Boolean function synthesis problem. This incompleteness of the problem will also generate spurious inferences, which are a serious threat to valid inductive inference rules. Using random Boolean data as a null model, here we attempt to measure the competition between valid and spurious inductive inference rules from a given data set. We formulate two greedy search algorithms, which synthesize a given Boolean response variable in a sparse disjunct normal form, and respectively a sparse generalized algebraic normal form of the variables from the observation data, and we evaluate numerically their performance.

  5. Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations

    PubMed Central

    Logsdon, Benjamin A.; Mezey, Jason

    2010-01-01

    Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data. PMID:21152011

  6. Gene network biological validity based on gene-gene interaction relevance.

    PubMed

    Gómez-Vela, Francisco; Díaz-Díaz, Norberto

    2014-01-01

    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.

  7. Reconstruction of stochastic temporal networks through diffusive arrival times

    NASA Astrophysics Data System (ADS)

    Li, Xun; Li, Xiang

    2017-06-01

    Temporal networks have opened a new dimension in defining and quantification of complex interacting systems. Our ability to identify and reproduce time-resolved interaction patterns is, however, limited by the restricted access to empirical individual-level data. Here we propose an inverse modelling method based on first-arrival observations of the diffusion process taking place on temporal networks. We describe an efficient coordinate-ascent implementation for inferring stochastic temporal networks that builds in particular but not exclusively on the null model assumption of mutually independent interaction sequences at the dyadic level. The results of benchmark tests applied on both synthesized and empirical network data sets confirm the validity of our algorithm, showing the feasibility of statistically accurate inference of temporal networks only from moderate-sized samples of diffusion cascades. Our approach provides an effective and flexible scheme for the temporally augmented inverse problems of network reconstruction and has potential in a broad variety of applications.

  8. Reconstruction of stochastic temporal networks through diffusive arrival times

    PubMed Central

    Li, Xun; Li, Xiang

    2017-01-01

    Temporal networks have opened a new dimension in defining and quantification of complex interacting systems. Our ability to identify and reproduce time-resolved interaction patterns is, however, limited by the restricted access to empirical individual-level data. Here we propose an inverse modelling method based on first-arrival observations of the diffusion process taking place on temporal networks. We describe an efficient coordinate-ascent implementation for inferring stochastic temporal networks that builds in particular but not exclusively on the null model assumption of mutually independent interaction sequences at the dyadic level. The results of benchmark tests applied on both synthesized and empirical network data sets confirm the validity of our algorithm, showing the feasibility of statistically accurate inference of temporal networks only from moderate-sized samples of diffusion cascades. Our approach provides an effective and flexible scheme for the temporally augmented inverse problems of network reconstruction and has potential in a broad variety of applications. PMID:28604687

  9. Personalized microbial network inference via co-regularized spectral clustering.

    PubMed

    Imangaliyev, Sultan; Keijser, Bart; Crielaard, Wim; Tsivtsivadze, Evgeni

    2015-07-15

    We use Human Microbiome Project (HMP) cohort (Peterson et al., 2009) to infer personalized oral microbial networks of healthy individuals. To determine clustering of individuals with similar microbial profiles, co-regularized spectral clustering algorithm is applied to the dataset. For each cluster we discovered, we compute co-occurrence relationships among the microbial species that determine microbial network per cluster of individuals. The results of our study suggest that there are several differences in microbial interactions on personalized network level in healthy oral samples acquired from various niches. Based on the results of co-regularized spectral clustering we discover two groups of individuals with different topology of their microbial interaction network. The results of microbial network inference suggest that niche-wise interactions are different in these two groups. Our study shows that healthy individuals have different microbial clusters according to their oral microbiota. Such personalized microbial networks open a better understanding of the microbial ecology of healthy oral cavities and new possibilities for future targeted medication. The scripts written in scientific Python and in Matlab, which were used for network visualization, are provided for download on the website http://learning-machines.com/. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. The Balanced Cross-Layer Design Routing Algorithm in Wireless Sensor Networks Using Fuzzy Logic.

    PubMed

    Li, Ning; Martínez, José-Fernán; Hernández Díaz, Vicente

    2015-08-10

    Recently, the cross-layer design for the wireless sensor network communication protocol has become more and more important and popular. Considering the disadvantages of the traditional cross-layer routing algorithms, in this paper we propose a new fuzzy logic-based routing algorithm, named the Balanced Cross-layer Fuzzy Logic (BCFL) routing algorithm. In BCFL, we use the cross-layer parameters' dispersion as the fuzzy logic inference system inputs. Moreover, we give each cross-layer parameter a dynamic weight according the value of the dispersion. For getting a balanced solution, the parameter whose dispersion is large will have small weight, and vice versa. In order to compare it with the traditional cross-layer routing algorithms, BCFL is evaluated through extensive simulations. The simulation results show that the new routing algorithm can handle the multiple constraints without increasing the complexity of the algorithm and can achieve the most balanced performance on selecting the next hop relay node. Moreover, the Balanced Cross-layer Fuzzy Logic routing algorithm can adapt to the dynamic changing of the network conditions and topology effectively.

  11. The Balanced Cross-Layer Design Routing Algorithm in Wireless Sensor Networks Using Fuzzy Logic

    PubMed Central

    Li, Ning; Martínez, José-Fernán; Díaz, Vicente Hernández

    2015-01-01

    Recently, the cross-layer design for the wireless sensor network communication protocol has become more and more important and popular. Considering the disadvantages of the traditional cross-layer routing algorithms, in this paper we propose a new fuzzy logic-based routing algorithm, named the Balanced Cross-layer Fuzzy Logic (BCFL) routing algorithm. In BCFL, we use the cross-layer parameters’ dispersion as the fuzzy logic inference system inputs. Moreover, we give each cross-layer parameter a dynamic weight according the value of the dispersion. For getting a balanced solution, the parameter whose dispersion is large will have small weight, and vice versa. In order to compare it with the traditional cross-layer routing algorithms, BCFL is evaluated through extensive simulations. The simulation results show that the new routing algorithm can handle the multiple constraints without increasing the complexity of the algorithm and can achieve the most balanced performance on selecting the next hop relay node. Moreover, the Balanced Cross-layer Fuzzy Logic routing algorithm can adapt to the dynamic changing of the network conditions and topology effectively. PMID:26266412

  12. Hierarchical Dirichlet process model for gene expression clustering

    PubMed Central

    2013-01-01

    Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447

  13. Model-free inference of direct network interactions from nonlinear collective dynamics.

    PubMed

    Casadiego, Jose; Nitzan, Mor; Hallerberg, Sarah; Timme, Marc

    2017-12-19

    The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.

  14. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering

    NASA Technical Reports Server (NTRS)

    Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland

    2000-01-01

    Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.

  15. Improved personalized recommendation based on a similarity network

    NASA Astrophysics Data System (ADS)

    Wang, Ximeng; Liu, Yun; Xiong, Fei

    2016-08-01

    A recommender system helps individual users find the preferred items rapidly and has attracted extensive attention in recent years. Many successful recommendation algorithms are designed on bipartite networks, such as network-based inference or heat conduction. However, most of these algorithms define the resource-allocation methods for an average allocation. That is not reasonable because average allocation cannot indicate the user choice preference and the influence between users which leads to a series of non-personalized recommendation results. We propose a personalized recommendation approach that combines the similarity function and bipartite network to generate a similarity network that improves the resource-allocation process. Our model introduces user influence into the recommender system and states that the user influence can make the resource-allocation process more reasonable. We use four different metrics to evaluate our algorithms for three benchmark data sets. Experimental results show that the improved recommendation on a similarity network can obtain better accuracy and diversity than some competing approaches.

  16. Reverse engineering a gene network using an asynchronous parallel evolution strategy

    PubMed Central

    2010-01-01

    Background The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task. Results Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested. Conclusions Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures. PMID:20196855

  17. Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells

    PubMed Central

    de Luis Balaguer, Maria Angels; Fisher, Adam P.; Clark, Natalie M.; Fernandez-Espinosa, Maria Guadalupe; Möller, Barbara K.; Weijers, Dolf; Williams, Cranos; Lorenzo, Oscar; Sozzani, Rosangela

    2017-01-01

    Identifying the transcription factors (TFs) and associated networks involved in stem cell regulation is essential for understanding the initiation and growth of plant tissues and organs. Although many TFs have been shown to have a role in the Arabidopsis root stem cells, a comprehensive view of the transcriptional signature of the stem cells is lacking. In this work, we used spatial and temporal transcriptomic data to predict interactions among the genes involved in stem cell regulation. To accomplish this, we transcriptionally profiled several stem cell populations and developed a gene regulatory network inference algorithm that combines clustering with dynamic Bayesian network inference. We leveraged the topology of our networks to infer potential major regulators. Specifically, through mathematical modeling and experimental validation, we identified PERIANTHIA (PAN) as an important molecular regulator of quiescent center function. The results presented in this work show that our combination of molecular biology, computational biology, and mathematical modeling is an efficient approach to identify candidate factors that function in the stem cells. PMID:28827319

  18. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo

    PubMed Central

    Golightly, Andrew; Wilkinson, Darren J.

    2011-01-01

    Computational systems biology is concerned with the development of detailed mechanistic models of biological processes. Such models are often stochastic and analytically intractable, containing uncertain parameters that must be estimated from time course data. In this article, we consider the task of inferring the parameters of a stochastic kinetic model defined as a Markov (jump) process. Inference for the parameters of complex nonlinear multivariate stochastic process models is a challenging problem, but we find here that algorithms based on particle Markov chain Monte Carlo turn out to be a very effective computationally intensive approach to the problem. Approximations to the inferential model based on stochastic differential equations (SDEs) are considered, as well as improvements to the inference scheme that exploit the SDE structure. We apply the methodology to a Lotka–Volterra system and a prokaryotic auto-regulatory network. PMID:23226583

  19. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling

    PubMed Central

    Shin, Junha; Lee, Insuk

    2015-01-01

    Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes. PMID:26394049

  20. Quantum Graphical Models and Belief Propagation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leifer, M.S.; Perimeter Institute for Theoretical Physics, 31 Caroline Street North, Waterloo Ont., N2L 2Y5; Poulin, D.

    Belief Propagation algorithms acting on Graphical Models of classical probability distributions, such as Markov Networks, Factor Graphs and Bayesian Networks, are amongst the most powerful known methods for deriving probabilistic inferences amongst large numbers of random variables. This paper presents a generalization of these concepts and methods to the quantum case, based on the idea that quantum theory can be thought of as a noncommutative, operator-valued, generalization of classical probability theory. Some novel characterizations of quantum conditional independence are derived, and definitions of Quantum n-Bifactor Networks, Markov Networks, Factor Graphs and Bayesian Networks are proposed. The structure of Quantum Markovmore » Networks is investigated and some partial characterization results are obtained, along the lines of the Hammersley-Clifford theorem. A Quantum Belief Propagation algorithm is presented and is shown to converge on 1-Bifactor Networks and Markov Networks when the underlying graph is a tree. The use of Quantum Belief Propagation as a heuristic algorithm in cases where it is not known to converge is discussed. Applications to decoding quantum error correcting codes and to the simulation of many-body quantum systems are described.« less

  1. Network portal: a database for storage, analysis and visualization of biological networks

    PubMed Central

    Turkarslan, Serdar; Wurtmann, Elisabeth J.; Wu, Wei-Ju; Jiang, Ning; Bare, J. Christopher; Foley, Karen; Reiss, David J.; Novichkov, Pavel; Baliga, Nitin S.

    2014-01-01

    The ease of generating high-throughput data has enabled investigations into organismal complexity at the systems level through the inference of networks of interactions among the various cellular components (genes, RNAs, proteins and metabolites). The wider scientific community, however, currently has limited access to tools for network inference, visualization and analysis because these tasks often require advanced computational knowledge and expensive computing resources. We have designed the network portal (http://networks.systemsbiology.net) to serve as a modular database for the integration of user uploaded and public data, with inference algorithms and tools for the storage, visualization and analysis of biological networks. The portal is fully integrated into the Gaggle framework to seamlessly exchange data with desktop and web applications and to allow the user to create, save and modify workspaces, and it includes social networking capabilities for collaborative projects. While the current release of the database contains networks for 13 prokaryotic organisms from diverse phylogenetic clades (4678 co-regulated gene modules, 3466 regulators and 9291 cis-regulatory motifs), it will be rapidly populated with prokaryotic and eukaryotic organisms as relevant data become available in public repositories and through user input. The modular architecture, simple data formats and open API support community development of the portal. PMID:24271392

  2. Reverse engineering gene regulatory networks from measurement with missing values.

    PubMed

    Ogundijo, Oyetunji E; Elmas, Abdulkadir; Wang, Xiaodong

    2016-12-01

    Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values. We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements . The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae . PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.

  3. ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks

    PubMed Central

    2014-01-01

    Background Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. Results In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. Conclusions This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally. PMID:24739361

  4. Social Circles Detection from Ego Network and Profile Information

    DTIC Science & Technology

    2014-12-19

    response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing... algorithm used to infer k-clique communities is expo- nential, which makes this technique unfeasible when treating egonets with a large number of users...atic when considering RBMs. This inconvenient was positively solved implementing a sparsity treatment with the RBM algorithm . (ii) The ground truth was

  5. Nonparametric Bayesian inference of the microcanonical stochastic block model

    NASA Astrophysics Data System (ADS)

    Peixoto, Tiago P.

    2017-01-01

    A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.

  6. Inference of genetic network of Xenopus frog egg: improved genetic algorithm.

    PubMed

    Wu, Shinq-Jen; Chou, Chia-Hsien; Wu, Cheng-Tao; Lee, Tsu-Tian

    2006-01-01

    An improved genetic algorithm (IGA) is proposed to achieve S-system gene network modeling of Xenopus frog egg. Via the time-courses training datasets from Michaelis-Menten model, the optimal parameters are learned. The S-system can clearly describe activative and inhibitory interaction between genes as generating and consuming process. We concern the mitotic control in cell-cycle of Xenopus frog egg to realize cyclin-Cdc2 and Cdc25 for MPF activity. The proposed IGA can achieve global search with migration and keep the best chromosome with elitism operation. The generated gene regulatory networks can provide biological researchers for further experiments in Xenopus frog egg cell cycle control.

  7. Network-based machine learning and graph theory algorithms for precision oncology.

    PubMed

    Zhang, Wei; Chien, Jeremy; Yong, Jeongsik; Kuang, Rui

    2017-01-01

    Network-based analytics plays an increasingly important role in precision oncology. Growing evidence in recent studies suggests that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual mutations and that the efficacy of repositioned drugs can be inferred from disease modules in molecular networks. This article reviews network-based machine learning and graph theory algorithms for integrative analysis of personal genomic data and biomedical knowledge bases to identify tumor-specific molecular mechanisms, candidate targets and repositioned drugs for personalized treatment. The review focuses on the algorithmic design and mathematical formulation of these methods to facilitate applications and implementations of network-based analysis in the practice of precision oncology. We review the methods applied in three scenarios to integrate genomic data and network models in different analysis pipelines, and we examine three categories of network-based approaches for repositioning drugs in drug-disease-gene networks. In addition, we perform a comprehensive subnetwork/pathway analysis of mutations in 31 cancer genome projects in the Cancer Genome Atlas and present a detailed case study on ovarian cancer. Finally, we discuss interesting observations, potential pitfalls and future directions in network-based precision oncology.

  8. Community-based benchmarking improves spike rate inference from two-photon calcium imaging data.

    PubMed

    Berens, Philipp; Freeman, Jeremy; Deneux, Thomas; Chenkov, Nikolay; McColgan, Thomas; Speiser, Artur; Macke, Jakob H; Turaga, Srinivas C; Mineault, Patrick; Rupprecht, Peter; Gerhard, Stephan; Friedrich, Rainer W; Friedrich, Johannes; Paninski, Liam; Pachitariu, Marius; Harris, Kenneth D; Bolte, Ben; Machado, Timothy A; Ringach, Dario; Stone, Jasmine; Rogerson, Luke E; Sofroniew, Nicolas J; Reimer, Jacob; Froudarakis, Emmanouil; Euler, Thomas; Román Rosón, Miroslav; Theis, Lucas; Tolias, Andreas S; Bethge, Matthias

    2018-05-01

    In recent years, two-photon calcium imaging has become a standard tool to probe the function of neural circuits and to study computations in neuronal populations. However, the acquired signal is only an indirect measurement of neural activity due to the comparatively slow dynamics of fluorescent calcium indicators. Different algorithms for estimating spike rates from noisy calcium measurements have been proposed in the past, but it is an open question how far performance can be improved. Here, we report the results of the spikefinder challenge, launched to catalyze the development of new spike rate inference algorithms through crowd-sourcing. We present ten of the submitted algorithms which show improved performance compared to previously evaluated methods. Interestingly, the top-performing algorithms are based on a wide range of principles from deep neural networks to generative models, yet provide highly correlated estimates of the neural activity. The competition shows that benchmark challenges can drive algorithmic developments in neuroscience.

  9. Gene regulatory network inference from multifactorial perturbation data using both regression and correlation analyses.

    PubMed

    Xiong, Jie; Zhou, Tong

    2012-01-01

    An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among [Formula: see text] genes is decomposed into [Formula: see text] different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0-1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.

  10. Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion

    PubMed Central

    Žitnik, Marinka; Zupan, Blaž

    2015-01-01

    Abstract Epistatic miniarray profile (E-MAP) is a popular large-scale genetic interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions with greater precision. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, in this way completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to missing data values. We introduce a new interaction data imputation method called network-guided matrix completion (NG-MC). The core part of NG-MC is low-rank probabilistic matrix completion that incorporates prior knowledge presented as a collection of gene networks. NG-MC assumes that interactions are transitive, such that latent gene interaction profiles inferred by NG-MC depend on the profiles of their direct neighbors in gene networks. As the NG-MC inference algorithm progresses, it propagates latent interaction profiles through each of the networks and updates gene network weights toward improved prediction. In a study with four different E-MAP data assays and considered protein–protein interaction and gene ontology similarity networks, NG-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allowed NG-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches. PMID:25658751

  11. ARACNe-AP: Gene Network Reverse Engineering through Adaptive Partitioning inference of Mutual Information. | Office of Cancer Genomics

    Cancer.gov

    The accurate reconstruction of gene regulatory networks from large scale molecular profile datasets represents one of the grand challenges of Systems Biology. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) represents one of the most effective tools to accomplish this goal. However, the initial Fixed Bandwidth (FB) implementation is both inefficient and unable to deal with sample sets providing largely uneven coverage of the probability density space.

  12. Effective network inference through multivariate information transfer estimation

    NASA Astrophysics Data System (ADS)

    Dahlqvist, Carl-Henrik; Gnabo, Jean-Yves

    2018-06-01

    Network representation has steadily gained in popularity over the past decades. In many disciplines such as finance, genetics, neuroscience or human travel to cite a few, the network may not directly be observable and needs to be inferred from time-series data, leading to the issue of separating direct interactions between two entities forming the network from indirect interactions coming through its remaining part. Drawing on recent contributions proposing strategies to deal with this problem such as the so-called "global silencing" approach of Barzel and Barabasi or "network deconvolution" of Feizi et al. (2013), we propose a novel methodology to infer an effective network structure from multivariate conditional information transfers. Its core principal is to test the information transfer between two nodes through a step-wise approach by conditioning the transfer for each pair on a specific set of relevant nodes as identified by our algorithm from the rest of the network. The methodology is model free and can be applied to high-dimensional networks with both inter-lag and intra-lag relationships. It outperforms state-of-the-art approaches for eliminating the redundancies and more generally retrieving simulated artificial networks in our Monte-Carlo experiments. We apply the method to stock market data at different frequencies (15 min, 1 h, 1 day) to retrieve the network of US largest financial institutions and then document how bank's centrality measurements relate to bank's systemic vulnerability.

  13. A new learning algorithm for a fully connected neuro-fuzzy inference system.

    PubMed

    Chen, C L Philip; Wang, Jing; Wang, Chi-Hsu; Chen, Long

    2014-10-01

    A traditional neuro-fuzzy system is transformed into an equivalent fully connected three layer neural network (NN), namely, the fully connected neuro-fuzzy inference systems (F-CONFIS). The F-CONFIS differs from traditional NNs by its dependent and repeated weights between input and hidden layers and can be considered as the variation of a kind of multilayer NN. Therefore, an efficient learning algorithm for the F-CONFIS to cope these repeated weights is derived. Furthermore, a dynamic learning rate is proposed for neuro-fuzzy systems via F-CONFIS where both premise (hidden) and consequent portions are considered. Several simulation results indicate that the proposed approach achieves much better accuracy and fast convergence.

  14. Bayesian Networks for Modeling Dredging Decisions

    DTIC Science & Technology

    2011-10-01

    change scenarios. Arctic Expert elicitation Netica Bacon et al . 2002 Identify factors that might lead to a change in land use from farming to...tree) algorithms developed by Lauritzen and Spiegelhalter (1988) and Jensen et al . (1990). Statistical inference is simply the process of...causality when constructing a Bayesian network (Kjaerulff and Madsen 2008, Darwiche 2009, Marcot et al . 2006). A knowledge representation approach is the

  15. Comparative Analysis of Soft Computing Models in Prediction of Bending Rigidity of Cotton Woven Fabrics

    NASA Astrophysics Data System (ADS)

    Guruprasad, R.; Behera, B. K.

    2015-10-01

    Quantitative prediction of fabric mechanical properties is an essential requirement for design engineering of textile and apparel products. In this work, the possibility of prediction of bending rigidity of cotton woven fabrics has been explored with the application of Artificial Neural Network (ANN) and two hybrid methodologies, namely Neuro-genetic modeling and Adaptive Neuro-Fuzzy Inference System (ANFIS) modeling. For this purpose, a set of cotton woven grey fabrics was desized, scoured and relaxed. The fabrics were then conditioned and tested for bending properties. With the database thus created, a neural network model was first developed using back propagation as the learning algorithm. The second model was developed by applying a hybrid learning strategy, in which genetic algorithm was first used as a learning algorithm to optimize the number of neurons and connection weights of the neural network. The Genetic algorithm optimized network structure was further allowed to learn using back propagation algorithm. In the third model, an ANFIS modeling approach was attempted to map the input-output data. The prediction performances of the models were compared and a sensitivity analysis was reported. The results show that the prediction by neuro-genetic and ANFIS models were better in comparison with that of back propagation neural network model.

  16. Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.

    2011-02-18

    Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant linksmore » between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.« less

  17. Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

    PubMed Central

    Michailidis, George

    2014-01-01

    Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g., wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network. PMID:24586224

  18. Neural model of gene regulatory network: a survey on supportive meta-heuristics.

    PubMed

    Biswas, Surama; Acharyya, Sriyankar

    2016-06-01

    Gene regulatory network (GRN) is produced as a result of regulatory interactions between different genes through their coded proteins in cellular context. Having immense importance in disease detection and drug finding, GRN has been modelled through various mathematical and computational schemes and reported in survey articles. Neural and neuro-fuzzy models have been the focus of attraction in bioinformatics. Predominant use of meta-heuristic algorithms in training neural models has proved its excellence. Considering these facts, this paper is organized to survey neural modelling schemes of GRN and the efficacy of meta-heuristic algorithms towards parameter learning (i.e. weighting connections) within the model. This survey paper renders two different structure-related approaches to infer GRN which are global structure approach and substructure approach. It also describes two neural modelling schemes, such as artificial neural network/recurrent neural network based modelling and neuro-fuzzy modelling. The meta-heuristic algorithms applied so far to learn the structure and parameters of neutrally modelled GRN have been reviewed here.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lucas, D. D.; Yver Kwok, C.; Cameron-Smith, P.

    Emission rates of greenhouse gases (GHGs) entering into the atmosphere can be inferred using mathematical inverse approaches that combine observations from a network of stations with forward atmospheric transport models. Some locations for collecting observations are better than others for constraining GHG emissions through the inversion, but the best locations for the inversion may be inaccessible or limited by economic and other non-scientific factors. We present a method to design an optimal GHG observing network in the presence of multiple objectives that may be in conflict with each other. As a demonstration, we use our method to design a prototypemore » network of six stations to monitor summertime emissions in California of the potent GHG 1,1,1,2-tetrafluoroethane (CH 2FCF 3, HFC-134a). We use a multiobjective genetic algorithm to evolve network configurations that seek to jointly maximize the scientific accuracy of the inferred HFC-134a emissions and minimize the associated costs of making the measurements. The genetic algorithm effectively determines a set of "optimal" observing networks for HFC-134a that satisfy both objectives (i.e., the Pareto frontier). The Pareto frontier is convex, and clearly shows the tradeoffs between performance and cost, and the diminishing returns in trading one for the other. Without difficulty, our method can be extended to design optimal networks to monitor two or more GHGs with different emissions patterns, or to incorporate other objectives and constraints that are important in the practical design of atmospheric monitoring networks.« less

  20. Specificity of Structural Assessment of Knowledge

    ERIC Educational Resources Information Center

    Trumpower, David L.; Sharara, Harold; Goldsmith, Timothy E.

    2010-01-01

    This study examines the specificity of information provided by structural assessment of knowledge (SAK). SAK is a technique which uses the Pathfinder scaling algorithm to transform ratings of concept relatedness into network representations (PFnets) of individuals' knowledge. Inferences about individuals' overall domain knowledge based on the…

  1. Inference in the brain: Statistics flowing in redundant population codes

    PubMed Central

    Pitkow, Xaq; Angelaki, Dora E

    2017-01-01

    It is widely believed that the brain performs approximate probabilistic inference to estimate causal variables in the world from ambiguous sensory data. To understand these computations, we need to analyze how information is represented and transformed by the actions of nonlinear recurrent neural networks. We propose that these probabilistic computations function by a message-passing algorithm operating at the level of redundant neural populations. To explain this framework, we review its underlying concepts, including graphical models, sufficient statistics, and message-passing, and then describe how these concepts could be implemented by recurrently connected probabilistic population codes. The relevant information flow in these networks will be most interpretable at the population level, particularly for redundant neural codes. We therefore outline a general approach to identify the essential features of a neural message-passing algorithm. Finally, we argue that to reveal the most important aspects of these neural computations, we must study large-scale activity patterns during moderately complex, naturalistic behaviors. PMID:28595050

  2. Network immunization under limited budget using graph spectra

    NASA Astrophysics Data System (ADS)

    Zahedi, R.; Khansari, M.

    2016-03-01

    In this paper, we propose a new algorithm that minimizes the worst expected growth of an epidemic by reducing the size of the largest connected component (LCC) of the underlying contact network. The proposed algorithm is applicable to any level of available resources and, despite the greedy approaches of most immunization strategies, selects nodes simultaneously. In each iteration, the proposed method partitions the LCC into two groups. These are the best candidates for communities in that component, and the available resources are sufficient to separate them. Using Laplacian spectral partitioning, the proposed method performs community detection inference with a time complexity that rivals that of the best previous methods. Experiments show that our method outperforms targeted immunization approaches in both real and synthetic networks.

  3. Inferring gene ontologies from pairwise similarity data

    PubMed Central

    Kramer, Michael; Dutkowski, Janusz; Yu, Michael; Bafna, Vineet; Ideker, Trey

    2014-01-01

    Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene–gene pairwise similarities from -omics data;infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; andrespect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference. Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast. Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall). Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data. Contact: tideker@ucsd.edu PMID:24932003

  4. Optimal design of gene knockout experiments for gene regulatory network inference

    PubMed Central

    Ud-Dean, S. M. Minhaz; Gunawan, Rudiyanto

    2016-01-01

    Motivation: We addressed the problem of inferring gene regulatory network (GRN) from gene expression data of knockout (KO) experiments. This inference is known to be underdetermined and the GRN is not identifiable from data. Past studies have shown that suboptimal design of experiments (DOE) contributes significantly to the identifiability issue of biological networks, including GRNs. However, optimizing DOE has received much less attention than developing methods for GRN inference. Results: We developed REDuction of UnCertain Edges (REDUCE) algorithm for finding the optimal gene KO experiment for inferring directed graphs (digraphs) of GRNs. REDUCE employed ensemble inference to define uncertain gene interactions that could not be verified by prior data. The optimal experiment corresponds to the maximum number of uncertain interactions that could be verified by the resulting data. For this purpose, we introduced the concept of edge separatoid which gave a list of nodes (genes) that upon their removal would allow the verification of a particular gene interaction. Finally, we proposed a procedure that iterates over performing KO experiments, ensemble update and optimal DOE. The case studies including the inference of Escherichia coli GRN and DREAM 4 100-gene GRNs, demonstrated the efficacy of the iterative GRN inference. In comparison to systematic KOs, REDUCE could provide much higher information return per gene KO experiment and consequently more accurate GRN estimates. Conclusions: REDUCE represents an enabling tool for tackling the underdetermined GRN inference. Along with advances in gene deletion and automation technology, the iterative procedure brings an efficient and fully automated GRN inference closer to reality. Availability and implementation: MATLAB and Python scripts of REDUCE are available on www.cabsel.ethz.ch/tools/REDUCE. Contact: rudi.gunawan@chem.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26568633

  5. Rethinking the learning of belief network probabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Musick, R.

    Belief networks are a powerful tool for knowledge discovery that provide concise, understandable probabilistic models of data. There are methods grounded in probability theory to incrementally update the relationships described by the belief network when new information is seen, to perform complex inferences over any set of variables in the data, to incorporate domain expertise and prior knowledge into the model, and to automatically learn the model from data. This paper concentrates on part of the belief network induction problem, that of learning the quantitative structure (the conditional probabilities), given the qualitative structure. In particular, the current practice of rotemore » learning the probabilities in belief networks can be significantly improved upon. We advance the idea of applying any learning algorithm to the task of conditional probability learning in belief networks, discuss potential benefits, and show results of applying neutral networks and other algorithms to a medium sized car insurance belief network. The results demonstrate from 10 to 100% improvements in model error rates over the current approaches.« less

  6. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

    PubMed Central

    2011-01-01

    Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions. PMID:21668997

  7. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    PubMed

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  8. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data

    PubMed Central

    Gogoshin, Grigoriy; Boerwinkle, Eric

    2017-01-01

    Abstract Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc. PMID:27681505

  9. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data.

    PubMed

    Gogoshin, Grigoriy; Boerwinkle, Eric; Rodin, Andrei S

    2017-04-01

    Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.

  10. Deep Unfolding for Topic Models.

    PubMed

    Chien, Jen-Tzung; Lee, Chao-Hsi

    2018-02-01

    Deep unfolding provides an approach to integrate the probabilistic generative models and the deterministic neural networks. Such an approach is benefited by deep representation, easy interpretation, flexible learning and stochastic modeling. This study develops the unsupervised and supervised learning of deep unfolded topic models for document representation and classification. Conventionally, the unsupervised and supervised topic models are inferred via the variational inference algorithm where the model parameters are estimated by maximizing the lower bound of logarithm of marginal likelihood using input documents without and with class labels, respectively. The representation capability or classification accuracy is constrained by the variational lower bound and the tied model parameters across inference procedure. This paper aims to relax these constraints by directly maximizing the end performance criterion and continuously untying the parameters in learning process via deep unfolding inference (DUI). The inference procedure is treated as the layer-wise learning in a deep neural network. The end performance is iteratively improved by using the estimated topic parameters according to the exponentiated updates. Deep learning of topic models is therefore implemented through a back-propagation procedure. Experimental results show the merits of DUI with increasing number of layers compared with variational inference in unsupervised as well as supervised topic models.

  11. Discriminative Relational Topic Models.

    PubMed

    Chen, Ning; Zhu, Jun; Xia, Fei; Zhang, Bo

    2015-05-01

    Relational topic models (RTMs) provide a probabilistic generative process to describe both the link structure and document contents for document networks, and they have shown promise on predicting network structures and discovering latent topic representations. However, existing RTMs have limitations in both the restricted model expressiveness and incapability of dealing with imbalanced network data. To expand the scope and improve the inference accuracy of RTMs, this paper presents three extensions: 1) unlike the common link likelihood with a diagonal weight matrix that allows the-same-topic interactions only, we generalize it to use a full weight matrix that captures all pairwise topic interactions and is applicable to asymmetric networks; 2) instead of doing standard Bayesian inference, we perform regularized Bayesian inference (RegBayes) with a regularization parameter to deal with the imbalanced link structure issue in real networks and improve the discriminative ability of learned latent representations; and 3) instead of doing variational approximation with strict mean-field assumptions, we present collapsed Gibbs sampling algorithms for the generalized relational topic models by exploring data augmentation without making restricting assumptions. Under the generic RegBayes framework, we carefully investigate two popular discriminative loss functions, namely, the logistic log-loss and the max-margin hinge loss. Experimental results on several real network datasets demonstrate the significance of these extensions on improving prediction performance.

  12. Reconstruction of extended Petri nets from time series data and its application to signal transduction and to gene regulatory networks

    PubMed Central

    2011-01-01

    Background Network inference methods reconstruct mathematical models of molecular or genetic networks directly from experimental data sets. We have previously reported a mathematical method which is exclusively data-driven, does not involve any heuristic decisions within the reconstruction process, and deliveres all possible alternative minimal networks in terms of simple place/transition Petri nets that are consistent with a given discrete time series data set. Results We fundamentally extended the previously published algorithm to consider catalysis and inhibition of the reactions that occur in the underlying network. The results of the reconstruction algorithm are encoded in the form of an extended Petri net involving control arcs. This allows the consideration of processes involving mass flow and/or regulatory interactions. As a non-trivial test case, the phosphate regulatory network of enterobacteria was reconstructed using in silico-generated time-series data sets on wild-type and in silico mutants. Conclusions The new exact algorithm reconstructs extended Petri nets from time series data sets by finding all alternative minimal networks that are consistent with the data. It suggested alternative molecular mechanisms for certain reactions in the network. The algorithm is useful to combine data from wild-type and mutant cells and may potentially integrate physiological, biochemical, pharmacological, and genetic data in the form of a single model. PMID:21762503

  13. CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data.

    PubMed

    Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang

    2016-12-23

    A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .

  14. Designing optimal greenhouse gas observing networks that consider performance and cost

    DOE PAGES

    Lucas, D. D.; Yver Kwok, C.; Cameron-Smith, P.; ...

    2015-06-16

    Emission rates of greenhouse gases (GHGs) entering into the atmosphere can be inferred using mathematical inverse approaches that combine observations from a network of stations with forward atmospheric transport models. Some locations for collecting observations are better than others for constraining GHG emissions through the inversion, but the best locations for the inversion may be inaccessible or limited by economic and other non-scientific factors. We present a method to design an optimal GHG observing network in the presence of multiple objectives that may be in conflict with each other. As a demonstration, we use our method to design a prototypemore » network of six stations to monitor summertime emissions in California of the potent GHG 1,1,1,2-tetrafluoroethane (CH 2FCF 3, HFC-134a). We use a multiobjective genetic algorithm to evolve network configurations that seek to jointly maximize the scientific accuracy of the inferred HFC-134a emissions and minimize the associated costs of making the measurements. The genetic algorithm effectively determines a set of "optimal" observing networks for HFC-134a that satisfy both objectives (i.e., the Pareto frontier). The Pareto frontier is convex, and clearly shows the tradeoffs between performance and cost, and the diminishing returns in trading one for the other. Without difficulty, our method can be extended to design optimal networks to monitor two or more GHGs with different emissions patterns, or to incorporate other objectives and constraints that are important in the practical design of atmospheric monitoring networks.« less

  15. Gene function prediction with gene interaction networks: a context graph kernel approach.

    PubMed

    Li, Xin; Chen, Hsinchun; Li, Jiexun; Zhang, Zhu

    2010-01-01

    Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.

  16. Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks.

    PubMed

    Van Landeghem, Sofie; Van Parys, Thomas; Dubois, Marieke; Inzé, Dirk; Van de Peer, Yves

    2016-01-05

    Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time. In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator. Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/.

  17. Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders.

    PubMed

    Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean

    2016-08-10

    Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.

  18. Clustering network layers with the strata multilayer stochastic block model.

    PubMed

    Stanley, Natalie; Shai, Saray; Taylor, Dane; Mucha, Peter J

    2016-01-01

    Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisely extract information from a multilayer network, we propose to identify and combine sets of layers with meaningful similarities in community structure. In this paper, we describe the "strata multilayer stochastic block model" (sMLSBM), a probabilistic model for multilayer community structure. The central extension of the model is that there exist groups of layers, called "strata", which are defined such that all layers in a given stratum have community structure described by a common stochastic block model (SBM). That is, layers in a stratum exhibit similar node-to-community assignments and SBM probability parameters. Fitting the sMLSBM to a multilayer network provides a joint clustering that yields node-to-community and layer-to-stratum assignments, which cooperatively aid one another during inference. We describe an algorithm for separating layers into their appropriate strata and an inference technique for estimating the SBM parameters for each stratum. We demonstrate our method using synthetic networks and a multilayer network inferred from data collected in the Human Microbiome Project.

  19. Clustering network layers with the strata multilayer stochastic block model

    PubMed Central

    Stanley, Natalie; Shai, Saray; Taylor, Dane; Mucha, Peter J.

    2016-01-01

    Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisely extract information from a multilayer network, we propose to identify and combine sets of layers with meaningful similarities in community structure. In this paper, we describe the “strata multilayer stochastic block model” (sMLSBM), a probabilistic model for multilayer community structure. The central extension of the model is that there exist groups of layers, called “strata”, which are defined such that all layers in a given stratum have community structure described by a common stochastic block model (SBM). That is, layers in a stratum exhibit similar node-to-community assignments and SBM probability parameters. Fitting the sMLSBM to a multilayer network provides a joint clustering that yields node-to-community and layer-to-stratum assignments, which cooperatively aid one another during inference. We describe an algorithm for separating layers into their appropriate strata and an inference technique for estimating the SBM parameters for each stratum. We demonstrate our method using synthetic networks and a multilayer network inferred from data collected in the Human Microbiome Project. PMID:28435844

  20. Information Extraction from Large-Multi-Layer Social Networks

    DTIC Science & Technology

    2015-08-06

    mization [4]. Methods that fall into this category include spec- tral algorithms, modularity methods, and methods that rely on statistical inference...Snijders and Chris Baerveldt, “A multilevel network study of the effects of delinquent behavior on friendship evolution,” Journal of mathematical sociol- ogy...1970. [10] Ulrike Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, Dec. 2007. [11] R. A. Fisher, “On

  1. Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver’s Gift

    DTIC Science & Technology

    2004-05-01

    system . We propose a real- time algorithm to infer depth and build 3-dimensional coarse maps for objects through the analysis of cues provided by an... system is well defined at the boundary of these regions (although the derivatives are not). A time domain analysis is presented for a piece-linear... Analysis of Multivariable Systems ......................... 266 D.3.1 Networks of Multiple Neural Oscillators ................. 266 D.3.2 Networks of

  2. A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks.

    PubMed

    Yue, Kun; Fang, Qiyu; Wang, Xiaoling; Li, Jin; Liu, Weiyi

    2015-12-01

    Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.

  3. Cross-Dependency Inference in Multi-Layered Networks: A Collaborative Filtering Perspective.

    PubMed

    Chen, Chen; Tong, Hanghang; Xie, Lei; Ying, Lei; He, Qing

    2017-08-01

    The increasingly connected world has catalyzed the fusion of networks from different domains, which facilitates the emergence of a new network model-multi-layered networks. Examples of such kind of network systems include critical infrastructure networks, biological systems, organization-level collaborations, cross-platform e-commerce, and so forth. One crucial structure that distances multi-layered network from other network models is its cross-layer dependency, which describes the associations between the nodes from different layers. Needless to say, the cross-layer dependency in the network plays an essential role in many data mining applications like system robustness analysis and complex network control. However, it remains a daunting task to know the exact dependency relationships due to noise, limited accessibility, and so forth. In this article, we tackle the cross-layer dependency inference problem by modeling it as a collective collaborative filtering problem. Based on this idea, we propose an effective algorithm Fascinate that can reveal unobserved dependencies with linear complexity. Moreover, we derive Fascinate-ZERO, an online variant of Fascinate that can respond to a newly added node timely by checking its neighborhood dependencies. We perform extensive evaluations on real datasets to substantiate the superiority of our proposed approaches.

  4. Recursive regularization for inferring gene networks from time-course gene expression profiles

    PubMed Central

    Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Fujita, André; Nagasaki, Masao; Miyano, Satoru

    2009-01-01

    Background Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives. Results By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG. Conclusion The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles. PMID:19386091

  5. Reverse engineering and analysis of large genome-scale gene networks

    PubMed Central

    Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

    2013-01-01

    Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249

  6. Empirical Bayes conditional independence graphs for regulatory network recovery.

    PubMed

    Mahdi, Rami; Madduri, Abishek S; Wang, Guoqing; Strulovici-Barel, Yael; Salit, Jacqueline; Hackett, Neil R; Crystal, Ronald G; Mezey, Jason G

    2012-08-01

    Computational inference methods that make use of graphical models to extract regulatory networks from gene expression data can have difficulty reconstructing dense regions of a network, a consequence of both computational complexity and unreliable parameter estimation when sample size is small. As a result, identification of hub genes is of special difficulty for these methods. We present a new algorithm, Empirical Light Mutual Min (ELMM), for large network reconstruction that has properties well suited for recovery of graphs with high-degree nodes. ELMM reconstructs the undirected graph of a regulatory network using empirical Bayes conditional independence testing with a heuristic relaxation of independence constraints in dense areas of the graph. This relaxation allows only one gene of a pair with a putative relation to be aware of the network connection, an approach that is aimed at easing multiple testing problems associated with recovering densely connected structures. Using in silico data, we show that ELMM has better performance than commonly used network inference algorithms including GeneNet, ARACNE, FOCI, GENIE3 and GLASSO. We also apply ELMM to reconstruct a network among 5492 genes expressed in human lung airway epithelium of healthy non-smokers, healthy smokers and individuals with chronic obstructive pulmonary disease assayed using microarrays. The analysis identifies dense sub-networks that are consistent with known regulatory relationships in the lung airway and also suggests novel hub regulatory relationships among a number of genes that play roles in oxidative stress and secretion. Software for running ELMM is made available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx. ramimahdi@yahoo.com or jgm45@cornell.edu Supplementary data are available at Bioinformatics online.

  7. Data mining in forecasting PVT correlations of crude oil systems based on Type1 fuzzy logic inference systems

    NASA Astrophysics Data System (ADS)

    El-Sebakhy, Emad A.

    2009-09-01

    Pressure-volume-temperature properties are very important in the reservoir engineering computations. There are many empirical approaches for predicting various PVT properties based on empirical correlations and statistical regression models. Last decade, researchers utilized neural networks to develop more accurate PVT correlations. These achievements of neural networks open the door to data mining techniques to play a major role in oil and gas industry. Unfortunately, the developed neural networks correlations are often limited, and global correlations are usually less accurate compared to local correlations. Recently, adaptive neuro-fuzzy inference systems have been proposed as a new intelligence framework for both prediction and classification based on fuzzy clustering optimization criterion and ranking. This paper proposes neuro-fuzzy inference systems for estimating PVT properties of crude oil systems. This new framework is an efficient hybrid intelligence machine learning scheme for modeling the kind of uncertainty associated with vagueness and imprecision. We briefly describe the learning steps and the use of the Takagi Sugeno and Kang model and Gustafson-Kessel clustering algorithm with K-detected clusters from the given database. It has featured in a wide range of medical, power control system, and business journals, often with promising results. A comparative study will be carried out to compare their performance of this new framework with the most popular modeling techniques, such as neural networks, nonlinear regression, and the empirical correlations algorithms. The results show that the performance of neuro-fuzzy systems is accurate, reliable, and outperform most of the existing forecasting techniques. Future work can be achieved by using neuro-fuzzy systems for clustering the 3D seismic data, identification of lithofacies types, and other reservoir characterization.

  8. Multi-agent based control of large-scale complex systems employing distributed dynamic inference engine

    NASA Astrophysics Data System (ADS)

    Zhang, Daili

    Increasing societal demand for automation has led to considerable efforts to control large-scale complex systems, especially in the area of autonomous intelligent control methods. The control system of a large-scale complex system needs to satisfy four system level requirements: robustness, flexibility, reusability, and scalability. Corresponding to the four system level requirements, there arise four major challenges. First, it is difficult to get accurate and complete information. Second, the system may be physically highly distributed. Third, the system evolves very quickly. Fourth, emergent global behaviors of the system can be caused by small disturbances at the component level. The Multi-Agent Based Control (MABC) method as an implementation of distributed intelligent control has been the focus of research since the 1970s, in an effort to solve the above-mentioned problems in controlling large-scale complex systems. However, to the author's best knowledge, all MABC systems for large-scale complex systems with significant uncertainties are problem-specific and thus difficult to extend to other domains or larger systems. This situation is partly due to the control architecture of multiple agents being determined by agent to agent coupling and interaction mechanisms. Therefore, the research objective of this dissertation is to develop a comprehensive, generalized framework for the control system design of general large-scale complex systems with significant uncertainties, with the focus on distributed control architecture design and distributed inference engine design. A Hybrid Multi-Agent Based Control (HyMABC) architecture is proposed by combining hierarchical control architecture and module control architecture with logical replication rings. First, it decomposes a complex system hierarchically; second, it combines the components in the same level as a module, and then designs common interfaces for all of the components in the same module; third, replications are made for critical agents and are organized into logical rings. This architecture maintains clear guidelines for complexity decomposition and also increases the robustness of the whole system. Multiple Sectioned Dynamic Bayesian Networks (MSDBNs) as a distributed dynamic probabilistic inference engine, can be embedded into the control architecture to handle uncertainties of general large-scale complex systems. MSDBNs decomposes a large knowledge-based system into many agents. Each agent holds its partial perspective of a large problem domain by representing its knowledge as a Dynamic Bayesian Network (DBN). Each agent accesses local evidence from its corresponding local sensors and communicates with other agents through finite message passing. If the distributed agents can be organized into a tree structure, satisfying the running intersection property and d-sep set requirements, globally consistent inferences are achievable in a distributed way. By using different frequencies for local DBN agent belief updating and global system belief updating, it balances the communication cost with the global consistency of inferences. In this dissertation, a fully factorized Boyen-Koller (BK) approximation algorithm is used for local DBN agent belief updating, and the static Junction Forest Linkage Tree (JFLT) algorithm is used for global system belief updating. MSDBNs assume a static structure and a stable communication network for the whole system. However, for a real system, sub-Bayesian networks as nodes could be lost, and the communication network could be shut down due to partial damage in the system. Therefore, on-line and automatic MSDBNs structure formation is necessary for making robust state estimations and increasing survivability of the whole system. A Distributed Spanning Tree Optimization (DSTO) algorithm, a Distributed D-Sep Set Satisfaction (DDSSS) algorithm, and a Distributed Running Intersection Satisfaction (DRIS) algorithm are proposed in this dissertation. Combining these three distributed algorithms and a Distributed Belief Propagation (DBP) algorithm in MSDBNs makes state estimations robust to partial damage in the whole system. Combining the distributed control architecture design and the distributed inference engine design leads to a process of control system design for a general large-scale complex system. As applications of the proposed methodology, the control system design of a simplified ship chilled water system and a notional ship chilled water system have been demonstrated step by step. Simulation results not only show that the proposed methodology gives a clear guideline for control system design for general large-scale complex systems with dynamic and uncertain environment, but also indicate that the combination of MSDBNs and HyMABC can provide excellent performance for controlling general large-scale complex systems.

  9. Random graph models for dynamic networks

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao; Moore, Cristopher; Newman, Mark E. J.

    2017-10-01

    Recent theoretical work on the modeling of network structure has focused primarily on networks that are static and unchanging, but many real-world networks change their structure over time. There exist natural generalizations to the dynamic case of many static network models, including the classic random graph, the configuration model, and the stochastic block model, where one assumes that the appearance and disappearance of edges are governed by continuous-time Markov processes with rate parameters that can depend on properties of the nodes. Here we give an introduction to this class of models, showing for instance how one can compute their equilibrium properties. We also demonstrate their use in data analysis and statistical inference, giving efficient algorithms for fitting them to observed network data using the method of maximum likelihood. This allows us, for example, to estimate the time constants of network evolution or infer community structure from temporal network data using cues embedded both in the probabilities over time that node pairs are connected by edges and in the characteristic dynamics of edge appearance and disappearance. We illustrate these methods with a selection of applications, both to computer-generated test networks and real-world examples.

  10. Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis.

    PubMed

    van Dam, Jesse C J; Schaap, Peter J; Martins dos Santos, Vitor A P; Suárez-Diez, María

    2014-09-26

    Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each separate set has an intrinsic value that is diluted and partly lost when building a consensus network. Here we present a methodology to generate co-expression networks and, instead of a consensus network, we propose an integration framework where the different networks are kept and analysed with additional tools to efficiently combine the information extracted from each network. We developed a workflow to efficiently analyse information generated by different inference and prediction methods. Our methodology relies on providing the user the means to simultaneously visualise and analyse the coexisting networks generated by different algorithms, heterogeneous datasets, and a suite of analysis tools. As a show case, we have analysed the gene co-expression networks of Mycobacterium tuberculosis generated using over 600 expression experiments. Regarding DNA damage repair, we identified SigC as a key control element, 12 new targets for LexA, an updated LexA binding motif, and a potential mismatch repair system. We expanded the DevR regulon with 27 genes while identifying 9 targets wrongly assigned to this regulon. We discovered 10 new genes linked to zinc uptake and a new regulatory mechanism for ZuR. The use of co-expression networks to perform system level analysis allows the development of custom made methodologies. As show cases we implemented a pipeline to integrate ChIP-seq data and another method to uncover multiple regulatory layers. Our workflow is based on representing the multiple types of information as network representations and presenting these networks in a synchronous framework that allows their simultaneous visualization while keeping specific associations from the different networks. By simultaneously exploring these networks and metadata, we gained insights into regulatory mechanisms in M. tuberculosis that could not be obtained through the separate analysis of each data type.

  11. Integrating Genetic and Functional Genomic Data to Elucidate Common Disease Tra

    NASA Astrophysics Data System (ADS)

    Schadt, Eric

    2005-03-01

    The reconstruction of genetic networks in mammalian systems is one of the primary goals in biological research, especially as such reconstructions relate to elucidating not only common, polygenic human diseases, but living systems more generally. Here I present a statistical procedure for inferring causal relationships between gene expression traits and more classic clinical traits, including complex disease traits. This procedure has been generalized to the gene network reconstruction problem, where naturally occurring genetic variations in segregating mouse populations are used as a source of perturbations to elucidate tissue-specific gene networks. Differences in the extent of genetic control between genders and among four different tissues are highlighted. I also demonstrate that the networks derived from expression data in segregating mouse populations using the novel network reconstruction algorithm are able to capture causal associations between genes that result in increased predictive power, compared to more classically reconstructed networks derived from the same data. This approach to causal inference in large segregating mouse populations over multiple tissues not only elucidates fundamental aspects of transcriptional control, it also allows for the objective identification of key drivers of common human diseases.

  12. Automated sleep stage detection with a classical and a neural learning algorithm--methodological aspects.

    PubMed

    Schwaibold, M; Schöchlin, J; Bolz, A

    2002-01-01

    For classification tasks in biosignal processing, several strategies and algorithms can be used. Knowledge-based systems allow prior knowledge about the decision process to be integrated, both by the developer and by self-learning capabilities. For the classification stages in a sleep stage detection framework, three inference strategies were compared regarding their specific strengths: a classical signal processing approach, artificial neural networks and neuro-fuzzy systems. Methodological aspects were assessed to attain optimum performance and maximum transparency for the user. Due to their effective and robust learning behavior, artificial neural networks could be recommended for pattern recognition, while neuro-fuzzy systems performed best for the processing of contextual information.

  13. A decentralized training algorithm for Echo State Networks in distributed big data applications.

    PubMed

    Scardapane, Simone; Wang, Dianhui; Panella, Massimo

    2016-06-01

    The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed-forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. A Wireless Sensor Network with Soft Computing Localization Techniques for Track Cycling Applications.

    PubMed

    Gharghan, Sadik Kamel; Nordin, Rosdiadee; Ismail, Mahamod

    2016-08-06

    In this paper, we propose two soft computing localization techniques for wireless sensor networks (WSNs). The two techniques, Neural Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN), focus on a range-based localization method which relies on the measurement of the received signal strength indicator (RSSI) from the three ZigBee anchor nodes distributed throughout the track cycling field. The soft computing techniques aim to estimate the distance between bicycles moving on the cycle track for outdoor and indoor velodromes. In the first approach the ANFIS was considered, whereas in the second approach the ANN was hybridized individually with three optimization algorithms, namely Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), and Backtracking Search Algorithm (BSA). The results revealed that the hybrid GSA-ANN outperforms the other methods adopted in this paper in terms of accuracy localization and distance estimation accuracy. The hybrid GSA-ANN achieves a mean absolute distance estimation error of 0.02 m and 0.2 m for outdoor and indoor velodromes, respectively.

  15. A Wireless Sensor Network with Soft Computing Localization Techniques for Track Cycling Applications

    PubMed Central

    Gharghan, Sadik Kamel; Nordin, Rosdiadee; Ismail, Mahamod

    2016-01-01

    In this paper, we propose two soft computing localization techniques for wireless sensor networks (WSNs). The two techniques, Neural Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN), focus on a range-based localization method which relies on the measurement of the received signal strength indicator (RSSI) from the three ZigBee anchor nodes distributed throughout the track cycling field. The soft computing techniques aim to estimate the distance between bicycles moving on the cycle track for outdoor and indoor velodromes. In the first approach the ANFIS was considered, whereas in the second approach the ANN was hybridized individually with three optimization algorithms, namely Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), and Backtracking Search Algorithm (BSA). The results revealed that the hybrid GSA-ANN outperforms the other methods adopted in this paper in terms of accuracy localization and distance estimation accuracy. The hybrid GSA-ANN achieves a mean absolute distance estimation error of 0.02 m and 0.2 m for outdoor and indoor velodromes, respectively. PMID:27509495

  16. Sparse Measurement Systems: Applications, Analysis, Algorithms and Design

    ERIC Educational Resources Information Center

    Narayanaswamy, Balakrishnan

    2011-01-01

    This thesis deals with "large-scale" detection problems that arise in many real world applications such as sensor networks, mapping with mobile robots and group testing for biological screening and drug discovery. These are problems where the values of a large number of inputs need to be inferred from noisy observations and where the…

  17. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  18. Modification of Hazen's equation in coarse grained soils by soft computing techniques

    NASA Astrophysics Data System (ADS)

    Kaynar, Oguz; Yilmaz, Isik; Marschalko, Marian; Bednarik, Martin; Fojtova, Lucie

    2013-04-01

    Hazen first proposed a Relationship between coefficient of permeability (k) and effective grain size (d10) was first proposed by Hazen, and it was then extended by some other researchers. However many attempts were done for estimation of k, correlation coefficients (R2) of the models were generally lower than ~0.80 and whole grain size distribution curves were not included in the assessments. Soft computing techniques such as; artificial neural networks, fuzzy inference systems, genetic algorithms, etc. and their hybrids are now being successfully used as an alternative tool. In this study, use of some soft computing techniques such as Artificial Neural Networks (ANNs) (MLP, RBF, etc.) and Adaptive Neuro-Fuzzy Inference System (ANFIS) for prediction of permeability of coarse grained soils was described, and Hazen's equation was then modificated. It was found that the soft computing models exhibited high performance in prediction of permeability coefficient. However four different kinds of ANN algorithms showed similar prediction performance, results of MLP was found to be relatively more accurate than RBF models. The most reliable prediction was obtained from ANFIS model.

  19. Evolving RBF neural networks for adaptive soft-sensor design.

    PubMed

    Alexandridis, Alex

    2013-12-01

    This work presents an adaptive framework for building soft-sensors based on radial basis function (RBF) neural network models. The adaptive fuzzy means algorithm is utilized in order to evolve an RBF network, which approximates the unknown system based on input-output data from it. The methodology gradually builds the RBF network model, based on two separate levels of adaptation: On the first level, the structure of the hidden layer is modified by adding or deleting RBF centers, while on the second level, the synaptic weights are adjusted with the recursive least squares with exponential forgetting algorithm. The proposed approach is tested on two different systems, namely a simulated nonlinear DC Motor and a real industrial reactor. The results show that the produced soft-sensors can be successfully applied to model the two nonlinear systems. A comparison with two different adaptive modeling techniques, namely a dynamic evolving neural-fuzzy inference system (DENFIS) and neural networks trained with online backpropagation, highlights the advantages of the proposed methodology.

  20. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

    PubMed Central

    Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

    2009-01-01

    Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885

  1. Inferring Weighted Directed Association Network from Multivariate Time Series with a Synthetic Method of Partial Symbolic Transfer Entropy Spectrum and Granger Causality

    PubMed Central

    Hu, Yanzhu; Ai, Xinbo

    2016-01-01

    Complex network methodology is very useful for complex system explorer. However, the relationships among variables in complex system are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a synthetic method, named small-shuffle partial symbolic transfer entropy spectrum (SSPSTES), for inferring association network from multivariate time series. The method synthesizes surrogate data, partial symbolic transfer entropy (PSTE) and Granger causality. A proper threshold selection is crucial for common correlation identification methods and it is not easy for users. The proposed method can not only identify the strong correlation without selecting a threshold but also has the ability of correlation quantification, direction identification and temporal relation identification. The method can be divided into three layers, i.e. data layer, model layer and network layer. In the model layer, the method identifies all the possible pair-wise correlation. In the network layer, we introduce a filter algorithm to remove the indirect weak correlation and retain strong correlation. Finally, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pair-wise variables, and then get the weighted directed association network. Two numerical simulated data from linear system and nonlinear system are illustrated to show the steps and performance of the proposed approach. The ability of the proposed method is approved by an application finally. PMID:27832153

  2. Enabling Comprehension of Patient Subgroups and Characteristics in Large Bipartite Networks: Implications for Precision Medicine

    PubMed Central

    Bhavnani, Suresh K.; Chen, Tianlong; Ayyaswamy, Archana; Visweswaran, Shyam; Bellala, Gowtham; Rohit, Divekar; Kevin E., Bassler

    2017-01-01

    A primary goal of precision medicine is to identify patient subgroups based on their characteristics (e.g., comorbidities or genes) with the goal of designing more targeted interventions. While network visualization methods such as Fruchterman-Reingold have been used to successfully identify such patient subgroups in small to medium sized data sets, they often fail to reveal comprehensible visual patterns in large and dense networks despite having significant clustering. We therefore developed an algorithm called ExplodeLayout, which exploits the existence of significant clusters in bipartite networks to automatically “explode” a traditional network layout with the goal of separating overlapping clusters, while at the same time preserving key network topological properties that are critical for the comprehension of patient subgroups. We demonstrate the utility of ExplodeLayout by visualizing a large dataset extracted from Medicare consisting of readmitted hip-fracture patients and their comorbidities, demonstrate its statistically significant improvement over a traditional layout algorithm, and discuss how the resulting network visualization enabled clinicians to infer mechanisms precipitating hospital readmission in specific patient subgroups. PMID:28815099

  3. Qualitative reasoning for biological network inference from systematic perturbation experiments.

    PubMed

    Badaloni, Silvana; Di Camillo, Barbara; Sambo, Francesco

    2012-01-01

    The systematic perturbation of the components of a biological system has been proven among the most informative experimental setups for the identification of causal relations between the components. In this paper, we present Systematic Perturbation-Qualitative Reasoning (SPQR), a novel Qualitative Reasoning approach to automate the interpretation of the results of systematic perturbation experiments. Our method is based on a qualitative abstraction of the experimental data: for each perturbation experiment, measured values of the observed variables are modeled as lower, equal or higher than the measurements in the wild type condition, when no perturbation is applied. The algorithm exploits a set of IF-THEN rules to infer causal relations between the variables, analyzing the patterns of propagation of the perturbation signals through the biological network, and is specifically designed to minimize the rate of false positives among the inferred relations. Tested on both simulated and real perturbation data, SPQR indeed exhibits a significantly higher precision than the state of the art.

  4. An argument for mechanism-based statistical inference in cancer

    PubMed Central

    Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent

    2015-01-01

    Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197

  5. U.S. stock market interaction network as learned by the Boltzmann machine

    DOE PAGES

    Borysov, Stanislav S.; Roudi, Yasser; Balatsky, Alexander V.

    2015-12-07

    Here, we study historical dynamics of joint equilibrium distribution of stock returns in the U.S. stock market using the Boltzmann distribution model being parametrized by external fields and pairwise couplings. Within Boltzmann learning framework for statistical inference, we analyze historical behavior of the parameters inferred using exact and approximate learning algorithms. Since the model and inference methods require use of binary variables, effect of this mapping of continuous returns to the discrete domain is studied. The presented results show that binarization preserves the correlation structure of the market. Properties of distributions of external fields and couplings as well as themore » market interaction network and industry sector clustering structure are studied for different historical dates and moving window sizes. We demonstrate that the observed positive heavy tail in distribution of couplings is related to the sparse clustering structure of the market. We also show that discrepancies between the model’s parameters might be used as a precursor of financial instabilities.« less

  6. Model-Free Reconstruction of Excitatory Neuronal Connectivity from Calcium Imaging Signals

    PubMed Central

    Stetter, Olav; Battaglia, Demian; Soriano, Jordi; Geisel, Theo

    2012-01-01

    A systematic assessment of global neural network connectivity through direct electrophysiological assays has remained technically infeasible, even in simpler systems like dissociated neuronal cultures. We introduce an improved algorithmic approach based on Transfer Entropy to reconstruct structural connectivity from network activity monitored through calcium imaging. We focus in this study on the inference of excitatory synaptic links. Based on information theory, our method requires no prior assumptions on the statistics of neuronal firing and neuronal connections. The performance of our algorithm is benchmarked on surrogate time series of calcium fluorescence generated by the simulated dynamics of a network with known ground-truth topology. We find that the functional network topology revealed by Transfer Entropy depends qualitatively on the time-dependent dynamic state of the network (bursting or non-bursting). Thus by conditioning with respect to the global mean activity, we improve the performance of our method. This allows us to focus the analysis to specific dynamical regimes of the network in which the inferred functional connectivity is shaped by monosynaptic excitatory connections, rather than by collective synchrony. Our method can discriminate between actual causal influences between neurons and spurious non-causal correlations due to light scattering artifacts, which inherently affect the quality of fluorescence imaging. Compared to other reconstruction strategies such as cross-correlation or Granger Causality methods, our method based on improved Transfer Entropy is remarkably more accurate. In particular, it provides a good estimation of the excitatory network clustering coefficient, allowing for discrimination between weakly and strongly clustered topologies. Finally, we demonstrate the applicability of our method to analyses of real recordings of in vitro disinhibited cortical cultures where we suggest that excitatory connections are characterized by an elevated level of clustering compared to a random graph (although not extreme) and can be markedly non-local. PMID:22927808

  7. Knowledge-guided fuzzy logic modeling to infer cellular signaling networks from proteomic data

    PubMed Central

    Liu, Hui; Zhang, Fan; Mishra, Shital Kumar; Zhou, Shuigeng; Zheng, Jie

    2016-01-01

    Modeling of signaling pathways is crucial for understanding and predicting cellular responses to drug treatments. However, canonical signaling pathways curated from literature are seldom context-specific and thus can hardly predict cell type-specific response to external perturbations; purely data-driven methods also have drawbacks such as limited biological interpretability. Therefore, hybrid methods that can integrate prior knowledge and real data for network inference are highly desirable. In this paper, we propose a knowledge-guided fuzzy logic network model to infer signaling pathways by exploiting both prior knowledge and time-series data. In particular, the dynamic time warping algorithm is employed to measure the goodness of fit between experimental and predicted data, so that our method can model temporally-ordered experimental observations. We evaluated the proposed method on a synthetic dataset and two real phosphoproteomic datasets. The experimental results demonstrate that our model can uncover drug-induced alterations in signaling pathways in cancer cells. Compared with existing hybrid models, our method can model feedback loops so that the dynamical mechanisms of signaling networks can be uncovered from time-series data. By calibrating generic models of signaling pathways against real data, our method supports precise predictions of context-specific anticancer drug effects, which is an important step towards precision medicine. PMID:27774993

  8. Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.

    PubMed

    Natarajan, Shreedhar; Jakobsson, Eric

    2009-06-12

    A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.

  9. Functional Equivalency Inferred from “Authoritative Sources” in Networks of Homologous Proteins

    PubMed Central

    Natarajan, Shreedhar; Jakobsson, Eric

    2009-01-01

    A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods. PMID:19521530

  10. Study on Data Clustering and Intelligent Decision Algorithm of Indoor Localization

    NASA Astrophysics Data System (ADS)

    Liu, Zexi

    2018-01-01

    Indoor positioning technology enables the human beings to have the ability of positional perception in architectural space, and there is a shortage of single network coverage and the problem of location data redundancy. So this article puts forward the indoor positioning data clustering algorithm and intelligent decision-making research, design the basic ideas of multi-source indoor positioning technology, analyzes the fingerprint localization algorithm based on distance measurement, position and orientation of inertial device integration. By optimizing the clustering processing of massive indoor location data, the data normalization pretreatment, multi-dimensional controllable clustering center and multi-factor clustering are realized, and the redundancy of locating data is reduced. In addition, the path is proposed based on neural network inference and decision, design the sparse data input layer, the dynamic feedback hidden layer and output layer, low dimensional results improve the intelligent navigation path planning.

  11. Quantum Inference on Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Yoder, Theodore; Low, Guang Hao; Chuang, Isaac

    2014-03-01

    Because quantum physics is naturally probabilistic, it seems reasonable to expect physical systems to describe probabilities and their evolution in a natural fashion. Here, we use quantum computation to speedup sampling from a graphical probability model, the Bayesian network. A specialization of this sampling problem is approximate Bayesian inference, where the distribution on query variables is sampled given the values e of evidence variables. Inference is a key part of modern machine learning and artificial intelligence tasks, but is known to be NP-hard. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time (nmP(e) - 1 / 2) , depending critically on P(e) , the probability the evidence might occur in the first place. However, by implementing a quantum version of rejection sampling, we obtain a square-root speedup, taking (n2m P(e) -1/2) time per sample. The speedup is the result of amplitude amplification, which is proving to be broadly applicable in sampling and machine learning tasks. In particular, we provide an explicit and efficient circuit construction that implements the algorithm without the need for oracle access.

  12. ARNetMiT R Package: association rules based gene co-expression networks of miRNA targets.

    PubMed

    Özgür Cingiz, M; Biricik, G; Diri, B

    2017-03-31

    miRNAs are key regulators that bind to target genes to suppress their gene expression level. The relations between miRNA-target genes enable users to derive co-expressed genes that may be involved in similar biological processes and functions in cells. We hypothesize that target genes of miRNAs are co-expressed, when they are regulated by multiple miRNAs. With the usage of these co-expressed genes, we can theoretically construct co-expression networks (GCNs) related to 152 diseases. In this study, we introduce ARNetMiT that utilize a hash based association rule algorithm in a novel way to infer the GCNs on miRNA-target genes data. We also present R package of ARNetMiT, which infers and visualizes GCNs of diseases that are selected by users. Our approach assumes miRNAs as transactions and target genes as their items. Support and confidence values are used to prune association rules on miRNA-target genes data to construct support based GCNs (sGCNs) along with support and confidence based GCNs (scGCNs). We use overlap analysis and the topological features for the performance analysis of GCNs. We also infer GCNs with popular GNI algorithms for comparison with the GCNs of ARNetMiT. Overlap analysis results show that ARNetMiT outperforms the compared GNI algorithms. We see that using high confidence values in scGCNs increase the ratio of the overlapped gene-gene interactions between the compared methods. According to the evaluation of the topological features of ARNetMiT based GCNs, the degrees of nodes have power-law distribution. The hub genes discovered by ARNetMiT based GCNs are consistent with the literature.

  13. On the Latent Variable Interpretation in Sum-Product Networks.

    PubMed

    Peharz, Robert; Gens, Robert; Pernkopf, Franz; Domingos, Pedro

    2017-10-01

    One of the central themes in Sum-Product networks (SPNs) is the interpretation of sum nodes as marginalized latent variables (LVs). This interpretation yields an increased syntactic or semantic structure, allows the application of the EM algorithm and to efficiently perform MPE inference. In literature, the LV interpretation was justified by explicitly introducing the indicator variables corresponding to the LVs' states. However, as pointed out in this paper, this approach is in conflict with the completeness condition in SPNs and does not fully specify the probabilistic model. We propose a remedy for this problem by modifying the original approach for introducing the LVs, which we call SPN augmentation. We discuss conditional independencies in augmented SPNs, formally establish the probabilistic interpretation of the sum-weights and give an interpretation of augmented SPNs as Bayesian networks. Based on these results, we find a sound derivation of the EM algorithm for SPNs. Furthermore, the Viterbi-style algorithm for MPE proposed in literature was never proven to be correct. We show that this is indeed a correct algorithm, when applied to selective SPNs, and in particular when applied to augmented SPNs. Our theoretical results are confirmed in experiments on synthetic data and 103 real-world datasets.

  14. Probability, statistics, and computational science.

    PubMed

    Beerenwinkel, Niko; Siebourg, Juliane

    2012-01-01

    In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.

  15. Learning-based computing techniques in geoid modeling for precise height transformation

    NASA Astrophysics Data System (ADS)

    Erol, B.; Erol, S.

    2013-03-01

    Precise determination of local geoid is of particular importance for establishing height control in geodetic GNSS applications, since the classical leveling technique is too laborious. A geoid model can be accurately obtained employing properly distributed benchmarks having GNSS and leveling observations using an appropriate computing algorithm. Besides the classical multivariable polynomial regression equations (MPRE), this study attempts an evaluation of learning based computing algorithms: artificial neural networks (ANNs), adaptive network-based fuzzy inference system (ANFIS) and especially the wavelet neural networks (WNNs) approach in geoid surface approximation. These algorithms were developed parallel to advances in computer technologies and recently have been used for solving complex nonlinear problems of many applications. However, they are rather new in dealing with precise modeling problem of the Earth gravity field. In the scope of the study, these methods were applied to Istanbul GPS Triangulation Network data. The performances of the methods were assessed considering the validation results of the geoid models at the observation points. In conclusion the ANFIS and WNN revealed higher prediction accuracies compared to ANN and MPRE methods. Beside the prediction capabilities, these methods were also compared and discussed from the practical point of view in conclusions.

  16. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system.

    PubMed

    Bal, Mert; Amasyali, M Fatih; Sever, Hayri; Kose, Guven; Demirhan, Ayse

    2014-01-01

    The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.

  17. Reverse engineering highlights potential principles of large gene regulatory network design and learning.

    PubMed

    Carré, Clément; Mas, André; Krouk, Gabriel

    2017-01-01

    Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 10 4 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data ( Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells.

  18. Accurate and diverse recommendations via eliminating redundant correlations

    NASA Astrophysics Data System (ADS)

    Zhou, Tao; Su, Ri-Qi; Liu, Run-Ran; Jiang, Luo-Luo; Wang, Bing-Hong; Zhang, Yi-Cheng

    2009-12-01

    In this paper, based on a weighted projection of a bipartite user-object network, we introduce a personalized recommendation algorithm, called network-based inference (NBI), which has higher accuracy than the classical algorithm, namely collaborative filtering. In NBI, the correlation resulting from a specific attribute may be repeatedly counted in the cumulative recommendations from different objects. By considering the higher order correlations, we design an improved algorithm that can, to some extent, eliminate the redundant correlations. We test our algorithm on two benchmark data sets, MovieLens and Netflix. Compared with NBI, the algorithmic accuracy, measured by the ranking score, can be further improved by 23 per cent for MovieLens and 22 per cent for Netflix. The present algorithm can even outperform the Latent Dirichlet Allocation algorithm, which requires much longer computational time. Furthermore, most previous studies considered the algorithmic accuracy only; in this paper, we argue that the diversity and popularity, as two significant criteria of algorithmic performance, should also be taken into account. With more or less the same accuracy, an algorithm giving higher diversity and lower popularity is more favorable. Numerical results show that the present algorithm can outperform the standard one simultaneously in all five adopted metrics: lower ranking score and higher precision for accuracy, larger Hamming distance and lower intra-similarity for diversity, as well as smaller average degree for popularity.

  19. Machine-learning in astronomy

    NASA Astrophysics Data System (ADS)

    Hobson, Michael; Graff, Philip; Feroz, Farhan; Lasenby, Anthony

    2014-05-01

    Machine-learning methods may be used to perform many tasks required in the analysis of astronomical data, including: data description and interpretation, pattern recognition, prediction, classification, compression, inference and many more. An intuitive and well-established approach to machine learning is the use of artificial neural networks (NNs), which consist of a group of interconnected nodes, each of which processes information that it receives and then passes this product on to other nodes via weighted connections. In particular, I discuss the first public release of the generic neural network training algorithm, called SkyNet, and demonstrate its application to astronomical problems focusing on its use in the BAMBI package for accelerated Bayesian inference in cosmology, and the identification of gamma-ray bursters. The SkyNet and BAMBI packages, which are fully parallelised using MPI, are available at http://www.mrao.cam.ac.uk/software/.

  20. Extracting information from multiplex networks

    NASA Astrophysics Data System (ADS)

    Iacovacci, Jacopo; Bianconi, Ginestra

    2016-06-01

    Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ ˜ S for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.

  1. Differential C3NET reveals disease networks of direct physical interactions

    PubMed Central

    2011-01-01

    Background Genes might have different gene interactions in different cell conditions, which might be mapped into different networks. Differential analysis of gene networks allows spotting condition-specific interactions that, for instance, form disease networks if the conditions are a disease, such as cancer, and normal. This could potentially allow developing better and subtly targeted drugs to cure cancer. Differential network analysis with direct physical gene interactions needs to be explored in this endeavour. Results C3NET is a recently introduced information theory based gene network inference algorithm that infers direct physical gene interactions from expression data, which was shown to give consistently higher inference performances over various networks than its competitors. In this paper, we present, DC3net, an approach to employ C3NET in inferring disease networks. We apply DC3net on a synthetic and real prostate cancer datasets, which show promising results. With loose cutoffs, we predicted 18583 interactions from tumor and normal samples in total. Although there are no reference interactions databases for the specific conditions of our samples in the literature, we found verifications for 54 of our predicted direct physical interactions from only four of the biological interaction databases. As an example, we predicted that RAD50 with TRF2 have prostate cancer specific interaction that turned out to be having validation from the literature. It is known that RAD50 complex associates with TRF2 in the S phase of cell cycle, which suggests that this predicted interaction may promote telomere maintenance in tumor cells in order to allow tumor cells to divide indefinitely. Our enrichment analysis suggests that the identified tumor specific gene interactions may be potentially important in driving the growth in prostate cancer. Additionally, we found that the highest connected subnetwork of our predicted tumor specific network is enriched for all proliferation genes, which further suggests that the genes in this network may serve in the process of oncogenesis. Conclusions Our approach reveals disease specific interactions. It may help to make experimental follow-up studies more cost and time efficient by prioritizing disease relevant parts of the global gene network. PMID:21777411

  2. Novel application of multi-stimuli network inference to synovial fibroblasts of rheumatoid arthritis patients

    PubMed Central

    2014-01-01

    Background Network inference of gene expression data is an important challenge in systems biology. Novel algorithms may provide more detailed gene regulatory networks (GRN) for complex, chronic inflammatory diseases such as rheumatoid arthritis (RA), in which activated synovial fibroblasts (SFBs) play a major role. Since the detailed mechanisms underlying this activation are still unclear, simultaneous investigation of multi-stimuli activation of SFBs offers the possibility to elucidate the regulatory effects of multiple mediators and to gain new insights into disease pathogenesis. Methods A GRN was therefore inferred from RA-SFBs treated with 4 different stimuli (IL-1 β, TNF- α, TGF- β, and PDGF-D). Data from time series microarray experiments (0, 1, 2, 4, 12 h; Affymetrix HG-U133 Plus 2.0) were batch-corrected applying ‘ComBat’, analyzed for differentially expressed genes over time with ‘Limma’, and used for the inference of a robust GRN with NetGenerator V2.0, a heuristic ordinary differential equation-based method with soft integration of prior knowledge. Results Using all genes differentially expressed over time in RA-SFBs for any stimulus, and selecting the genes belonging to the most significant gene ontology (GO) term, i.e., ‘cartilage development’, a dynamic, robust, moderately complex multi-stimuli GRN was generated with 24 genes and 57 edges in total, 31 of which were gene-to-gene edges. Prior literature-based knowledge derived from Pathway Studio or manual searches was reflected in the final network by 25/57 confirmed edges (44%). The model contained known network motifs crucial for dynamic cellular behavior, e.g., cross-talk among pathways, positive feed-back loops, and positive feed-forward motifs (including suppression of the transcriptional repressor OSR2 by all 4 stimuli. Conclusion A multi-stimuli GRN highly concordant with literature data was successfully generated by network inference from the gene expression of stimulated RA-SFBs. The GRN showed high reliability, since 10 predicted edges were independently validated by literature findings post network inference. The selected GO term ‘cartilage development’ contained a number of differentiation markers, growth factors, and transcription factors with potential relevance for RA. Finally, the model provided new insight into the response of RA-SFBs to multiple stimuli implicated in the pathogenesis of RA, in particular to the ‘novel’ potent growth factor PDGF-D. PMID:24989895

  3. AF-DHNN: Fuzzy Clustering and Inference-Based Node Fault Diagnosis Method for Fire Detection

    PubMed Central

    Jin, Shan; Cui, Wen; Jin, Zhigang; Wang, Ying

    2015-01-01

    Wireless Sensor Networks (WSNs) have been utilized for node fault diagnosis in the fire detection field since the 1990s. However, the traditional methods have some problems, including complicated system structures, intensive computation needs, unsteady data detection and local minimum values. In this paper, a new diagnosis mechanism for WSN nodes is proposed, which is based on fuzzy theory and an Adaptive Fuzzy Discrete Hopfield Neural Network (AF-DHNN). First, the original status of each sensor over time is obtained with two features. One is the root mean square of the filtered signal (FRMS), the other is the normalized summation of the positive amplitudes of the difference spectrum between the measured signal and the healthy one (NSDS). Secondly, distributed fuzzy inference is introduced. The evident abnormal nodes’ status is pre-alarmed to save time. Thirdly, according to the dimensions of the diagnostic data, an adaptive diagnostic status system is established with a Fuzzy C-Means Algorithm (FCMA) and Sorting and Classification Algorithm to reducing the complexity of the fault determination. Fourthly, a Discrete Hopfield Neural Network (DHNN) with iterations is improved with the optimization of the sensors’ detected status information and standard diagnostic levels, with which the associative memory is achieved, and the search efficiency is improved. The experimental results show that the AF-DHNN method can diagnose abnormal WSN node faults promptly and effectively, which improves the WSN reliability. PMID:26193280

  4. Bayesian networks in neuroscience: a survey.

    PubMed

    Bielza, Concha; Larrañaga, Pedro

    2014-01-01

    Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind-morphological, electrophysiological, -omics and neuroimaging-, thereby broadening the scope-molecular, cellular, structural, functional, cognitive and medical- of the brain aspects to be studied.

  5. Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient.

    PubMed

    Mohamed Salleh, Faridah Hani; Arif, Shereena Mohd; Zainudin, Suhaila; Firdaus-Raih, Mohd

    2015-12-01

    A gene regulatory network (GRN) is a large and complex network consisting of interacting elements that, over time, affect each other's state. The dynamics of complex gene regulatory processes are difficult to understand using intuitive approaches alone. To overcome this problem, we propose an algorithm for inferring the regulatory interactions from knock-out data using a Gaussian model combines with Pearson Correlation Coefficient (PCC). There are several problems relating to GRN construction that have been outlined in this paper. We demonstrated the ability of our proposed method to (1) predict the presence of regulatory interactions between genes, (2) their directionality and (3) their states (activation or suppression). The algorithm was applied to network sizes of 10 and 50 genes from DREAM3 datasets and network sizes of 10 from DREAM4 datasets. The predicted networks were evaluated based on AUROC and AUPR. We discovered that high false positive values were generated by our GRN prediction methods because the indirect regulations have been wrongly predicted as true relationships. We achieved satisfactory results as the majority of sub-networks achieved AUROC values above 0.5. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Bayesian networks in neuroscience: a survey

    PubMed Central

    Bielza, Concha; Larrañaga, Pedro

    2014-01-01

    Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind–morphological, electrophysiological, -omics and neuroimaging–, thereby broadening the scope–molecular, cellular, structural, functional, cognitive and medical– of the brain aspects to be studied. PMID:25360109

  7. Developing algorithm for the critical care physician scheduling

    NASA Astrophysics Data System (ADS)

    Lee, Hyojun; Pah, Adam; Amaral, Luis; Northwestern Memorial Hospital Collaboration

    Understanding the social network has enabled us to quantitatively study social phenomena such as behaviors in adoption and propagation of information. However, most work has been focusing on networks of large heterogeneous communities, and little attention has been paid to how work-relevant information spreads within networks of small and homogeneous groups of highly trained individuals, such as physicians. Within the professionals, the behavior patterns and the transmission of information relevant to the job are dependent not only on the social network between the employees but also on the schedules and teams that work together. In order to systematically investigate the dependence of the spread of ideas and adoption of innovations on a work-environment network, we sought to construct a model for the interaction network of critical care physicians at Northwestern Memorial Hospital (NMH) based on their work schedules. We inferred patterns and hidden rules from past work schedules such as turnover rates. Using the characteristics of the work schedules of the physicians and their turnover rates, we were able to create multi-year synthetic work schedules for a generic intensive care unit. The algorithm for creating shift schedules can be applied to other schedule dependent networks ARO1.

  8. JCell--a Java-based framework for inferring regulatory networks from time series data.

    PubMed

    Spieth, C; Supper, J; Streichert, F; Speer, N; Zell, A

    2006-08-15

    JCell is a Java-based application for reconstructing gene regulatory networks from experimental data. The framework provides several algorithms to identify genetic and metabolic dependencies based on experimental data conjoint with mathematical models to describe and simulate regulatory systems. Owing to the modular structure, researchers can easily implement new methods. JCell is a pure Java application with additional scripting capabilities and thus widely usable, e.g. on parallel or cluster computers. The software is freely available for download at http://www-ra.informatik.uni-tuebingen.de/software/JCell.

  9. The design and application of a Transportable Inference Engine (TIE1)

    NASA Technical Reports Server (NTRS)

    Mclean, David R.

    1986-01-01

    A Transportable Inference Engine (TIE1) system has been developed by the author as part of the Interactive Experimenter Planning System (IEPS) task which is involved with developing expert systems in support of the Spacecraft Control Programs Branch at Goddard Space Flight Center in Greenbelt, Maryland. Unlike traditional inference engines, TIE1 is written in the C programming language. In the TIE1 system, knowledge is represented by a hierarchical network of objects which have rule frames. The TIE1 search algorithm uses a set of strategies, including backward chaining, to obtain the values of goals. The application of TIE1 to a spacecraft scheduling problem is described. This application involves the development of a strategies interpreter which uses TIE1 to do constraint checking.

  10. Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.

    2010-01-01

    One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms.

  11. Combinatorial influence of environmental parameters on transcription factor activity.

    PubMed

    Knijnenburg, T A; Wessels, L F A; Reinders, M J T

    2008-07-01

    Cells receive a wide variety of environmental signals, which are often processed combinatorially to generate specific genetic responses. Changes in transcript levels, as observed across different environmental conditions, can, to a large extent, be attributed to changes in the activity of transcription factors (TFs). However, in unraveling these transcription regulation networks, the actual environmental signals are often not incorporated into the model, simply because they have not been measured. The unquantified heterogeneity of the environmental parameters across microarray experiments frustrates regulatory network inference. We propose an inference algorithm that models the influence of environmental parameters on gene expression. The approach is based on a yeast microarray compendium of chemostat steady-state experiments. Chemostat cultivation enables the accurate control and measurement of many of the key cultivation parameters, such as nutrient concentrations, growth rate and temperature. The observed transcript levels are explained by inferring the activity of TFs in response to combinations of cultivation parameters. The interplay between activated enhancers and repressors that bind a gene promoter determine the possible up- or downregulation of the gene. The model is translated into a linear integer optimization problem. The resulting regulatory network identifies the combinatorial effects of environmental parameters on TF activity and gene expression. The Matlab code is available from the authors upon request. Supplementary data are available at Bioinformatics online.

  12. Using a genetic algorithm to optimize a water-monitoring network for accuracy and cost effectiveness

    NASA Astrophysics Data System (ADS)

    Julich, R. J.

    2004-05-01

    The purpose of this project is to determine the optimal spatial distribution of water-monitoring wells to maximize important data collection and to minimize the cost of managing the network. We have employed a genetic algorithm (GA) towards this goal. The GA uses a simple fitness measure with two parts: the first part awards a maximal score to those combinations of hydraulic head observations whose net uncertainty is closest to the value representing all observations present, thereby maximizing accuracy; the second part applies a penalty function to minimize the number of observations, thereby minimizing the overall cost of the monitoring network. We used the linear statistical inference equation to calculate standard deviations on predictions from a numerical model generated for the 501-observation Death Valley Regional Flow System as the basis for our uncertainty calculations. We have organized the results to address the following three questions: 1) what is the optimal design strategy for a genetic algorithm to optimize this problem domain; 2) what is the consistency of solutions over several optimization runs; and 3) how do these results compare to what is known about the conceptual hydrogeology? Our results indicate the genetic algorithms are a more efficient and robust method for solving this class of optimization problems than have been traditional optimization approaches.

  13. From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation Among Gene Classes from Large-Scale Expression Data

    NASA Technical Reports Server (NTRS)

    Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara

    2000-01-01

    We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.

  14. Inference and Analysis of Population Structure Using Genetic Data and Network Theory

    PubMed Central

    Greenbaum, Gili; Templeton, Alan R.; Bar-David, Shirli

    2016-01-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition’s modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/). PMID:26888080

  15. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    PubMed

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/). Copyright © 2016 by the Genetics Society of America.

  16. The Incremental Multiresolution Matrix Factorization Algorithm

    PubMed Central

    Ithapu, Vamsi K.; Kondor, Risi; Johnson, Sterling C.; Singh, Vikas

    2017-01-01

    Multiresolution analysis and matrix factorization are foundational tools in computer vision. In this work, we study the interface between these two distinct topics and obtain techniques to uncover hierarchical block structure in symmetric matrices – an important aspect in the success of many vision problems. Our new algorithm, the incremental multiresolution matrix factorization, uncovers such structure one feature at a time, and hence scales well to large matrices. We describe how this multiscale analysis goes much farther than what a direct “global” factorization of the data can identify. We evaluate the efficacy of the resulting factorizations for relative leveraging within regression tasks using medical imaging data. We also use the factorization on representations learned by popular deep networks, providing evidence of their ability to infer semantic relationships even when they are not explicitly trained to do so. We show that this algorithm can be used as an exploratory tool to improve the network architecture, and within numerous other settings in vision. PMID:29416293

  17. Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks.

    PubMed

    Chen, Chi-Kan

    2017-07-26

    The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RE RNN /RE RMLP ), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RE RNN -RNN and RE RMLP -RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RE RMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RE RNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RE RMLP -RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.

  18. PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA

    PubMed Central

    Mutwil, Marek; Klie, Sebastian; Tohge, Takayuki; Giorgi, Federico M.; Wilkins, Olivia; Campbell, Malcolm M.; Fernie, Alisdair R.; Usadel, Björn; Nikoloski, Zoran; Persson, Staffan

    2011-01-01

    The model organism Arabidopsis thaliana is readily used in basic research due to resource availability and relative speed of data acquisition. A major goal is to transfer acquired knowledge from Arabidopsis to crop species. However, the identification of functional equivalents of well-characterized Arabidopsis genes in other plants is a nontrivial task. It is well documented that transcriptionally coordinated genes tend to be functionally related and that such relationships may be conserved across different species and even kingdoms. To exploit such relationships, we constructed whole-genome coexpression networks for Arabidopsis and six important plant crop species. The interactive networks, clustered using the HCCA algorithm, are provided under the banner PlaNet (http://aranet.mpimp-golm.mpg.de). We implemented a comparative network algorithm that estimates similarities between network structures. Thus, the platform can be used to swiftly infer similar coexpressed network vicinities within and across species and can predict the identity of functional homologs. We exemplify this using the PSA-D and chalcone synthase-related gene networks. Finally, we assessed how ontology terms are transcriptionally connected in the seven species and provide the corresponding MapMan term coexpression networks. The data support the contention that this platform will considerably improve transfer of knowledge generated in Arabidopsis to valuable crop species. PMID:21441431

  19. GENIUS: web server to predict local gene networks and key genes for biological functions.

    PubMed

    Puelma, Tomas; Araus, Viviana; Canales, Javier; Vidal, Elena A; Cabello, Juan M; Soto, Alvaro; Gutiérrez, Rodrigo A

    2017-03-01

    GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods. GENIUS currently supports eight model organisms and is freely available for public use at http://networks.bio.puc.cl/genius . genius.psbl@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  20. A grammar inference approach for predicting kinase specific phosphorylation sites.

    PubMed

    Datta, Sutapa; Mukhopadhyay, Subhasis

    2015-01-01

    Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

  1. Inferring animal social networks and leadership: applications for passive monitoring arrays.

    PubMed

    Jacoby, David M P; Papastamatiou, Yannis P; Freeman, Robin

    2016-11-01

    Analyses of animal social networks have frequently benefited from techniques derived from other disciplines. Recently, machine learning algorithms have been adopted to infer social associations from time-series data gathered using remote, telemetry systems situated at provisioning sites. We adapt and modify existing inference methods to reveal the underlying social structure of wide-ranging marine predators moving through spatial arrays of passive acoustic receivers. From six months of tracking data for grey reef sharks (Carcharhinus amblyrhynchos) at Palmyra atoll in the Pacific Ocean, we demonstrate that some individuals emerge as leaders within the population and that this behavioural coordination is predicted by both sex and the duration of co-occurrences between conspecifics. In doing so, we provide the first evidence of long-term, spatially extensive social processes in wild sharks. To achieve these results, we interrogate simulated and real tracking data with the explicit purpose of drawing attention to the key considerations in the use and interpretation of inference methods and their impact on resultant social structure. We provide a modified translation of the GMMEvents method for R, including new analyses quantifying the directionality and duration of social events with the aim of encouraging the careful use of these methods more widely in less tractable social animal systems but where passive telemetry is already widespread. © 2016 The Authors.

  2. Inferring animal social networks and leadership: applications for passive monitoring arrays

    PubMed Central

    Papastamatiou, Yannis P.; Freeman, Robin

    2016-01-01

    Analyses of animal social networks have frequently benefited from techniques derived from other disciplines. Recently, machine learning algorithms have been adopted to infer social associations from time-series data gathered using remote, telemetry systems situated at provisioning sites. We adapt and modify existing inference methods to reveal the underlying social structure of wide-ranging marine predators moving through spatial arrays of passive acoustic receivers. From six months of tracking data for grey reef sharks (Carcharhinus amblyrhynchos) at Palmyra atoll in the Pacific Ocean, we demonstrate that some individuals emerge as leaders within the population and that this behavioural coordination is predicted by both sex and the duration of co-occurrences between conspecifics. In doing so, we provide the first evidence of long-term, spatially extensive social processes in wild sharks. To achieve these results, we interrogate simulated and real tracking data with the explicit purpose of drawing attention to the key considerations in the use and interpretation of inference methods and their impact on resultant social structure. We provide a modified translation of the GMMEvents method for R, including new analyses quantifying the directionality and duration of social events with the aim of encouraging the careful use of these methods more widely in less tractable social animal systems but where passive telemetry is already widespread. PMID:27881803

  3. MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

    PubMed

    Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A

    2011-07-01

    Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.

  4. Challenges of CAC in Heterogeneous Wireless Cognitive Networks

    NASA Astrophysics Data System (ADS)

    Wang, Jiazheng; Fu, Xiuhua

    Call admission control (CAC) is known as an effective functionality in ensuring the QoS of wireless networks. The vision of next generation wireless networks has led to the development of new call admission control (CAC) algorithms specifically designed for heterogeneous wireless Cognitive networks. However, there will be a number of challenges created by dynamic spectrum access and scheduling techniques associated with the cognitive systems. In this paper for the first time, we recommend that the CAC policies should be distinguished between primary users and secondary users. The classification of different methods of cac policies in cognitive networks contexts is proposed. Although there have been some researches within the umbrella of Joint CAC and cross-layer optimization for wireless networks, the advent of the cognitive networks adds some additional problems. We present the conceptual models for joint CAC and cross-layer optimization respectively. Also, the benefit of Cognition can only be realized fully if application requirements and traffic flow contexts are determined or inferred in order to know what modes of operation and spectrum bands to use at each point in time. The process model of Cognition involved per-flow-based CAC is presented. Because there may be a number of parameters on different levels affecting a CAC decision and the conditions for accepting or rejecting a call must be computed quickly and frequently, simplicity and practicability are particularly important for designing a feasible CAC algorithm. In a word, a more thorough understanding of CAC in heterogeneous wireless cognitive networks may help one to design better CAC algorithms.

  5. A Semi-Supervised Learning Algorithm for Predicting Four Types MiRNA-Disease Associations by Mutual Information in a Heterogeneous Network.

    PubMed

    Zhang, Xiaotian; Yin, Jian; Zhang, Xu

    2018-03-02

    Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.

  6. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    PubMed

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  7. Automated general temperature correction method for dielectric soil moisture sensors

    NASA Astrophysics Data System (ADS)

    Kapilaratne, R. G. C. Jeewantinie; Lu, Minjiao

    2017-08-01

    An effective temperature correction method for dielectric sensors is important to ensure the accuracy of soil water content (SWC) measurements of local to regional-scale soil moisture monitoring networks. These networks are extensively using highly temperature sensitive dielectric sensors due to their low cost, ease of use and less power consumption. Yet there is no general temperature correction method for dielectric sensors, instead sensor or site dependent correction algorithms are employed. Such methods become ineffective at soil moisture monitoring networks with different sensor setups and those that cover diverse climatic conditions and soil types. This study attempted to develop a general temperature correction method for dielectric sensors which can be commonly used regardless of the differences in sensor type, climatic conditions and soil type without rainfall data. In this work an automated general temperature correction method was developed by adopting previously developed temperature correction algorithms using time domain reflectometry (TDR) measurements to ThetaProbe ML2X, Stevens Hydra probe II and Decagon Devices EC-TM sensor measurements. The rainy day effects removal procedure from SWC data was automated by incorporating a statistical inference technique with temperature correction algorithms. The temperature correction method was evaluated using 34 stations from the International Soil Moisture Monitoring Network and another nine stations from a local soil moisture monitoring network in Mongolia. Soil moisture monitoring networks used in this study cover four major climates and six major soil types. Results indicated that the automated temperature correction algorithms developed in this study can eliminate temperature effects from dielectric sensor measurements successfully even without on-site rainfall data. Furthermore, it has been found that actual daily average of SWC has been changed due to temperature effects of dielectric sensors with a significant error factor comparable to ±1% manufacturer's accuracy.

  8. Decoding communities in networks

    NASA Astrophysics Data System (ADS)

    Radicchi, Filippo

    2018-02-01

    According to a recent information-theoretical proposal, the problem of defining and identifying communities in networks can be interpreted as a classical communication task over a noisy channel: memberships of nodes are information bits erased by the channel, edges and nonedges in the network are parity bits introduced by the encoder but degraded through the channel, and a community identification algorithm is a decoder. The interpretation is perfectly equivalent to the one at the basis of well-known statistical inference algorithms for community detection. The only difference in the interpretation is that a noisy channel replaces a stochastic network model. However, the different perspective gives the opportunity to take advantage of the rich set of tools of coding theory to generate novel insights on the problem of community detection. In this paper, we illustrate two main applications of standard coding-theoretical methods to community detection. First, we leverage a state-of-the-art decoding technique to generate a family of quasioptimal community detection algorithms. Second and more important, we show that the Shannon's noisy-channel coding theorem can be invoked to establish a lower bound, here named as decodability bound, for the maximum amount of noise tolerable by an ideal decoder to achieve perfect detection of communities. When computed for well-established synthetic benchmarks, the decodability bound explains accurately the performance achieved by the best community detection algorithms existing on the market, telling us that only little room for their improvement is still potentially left.

  9. Decoding communities in networks.

    PubMed

    Radicchi, Filippo

    2018-02-01

    According to a recent information-theoretical proposal, the problem of defining and identifying communities in networks can be interpreted as a classical communication task over a noisy channel: memberships of nodes are information bits erased by the channel, edges and nonedges in the network are parity bits introduced by the encoder but degraded through the channel, and a community identification algorithm is a decoder. The interpretation is perfectly equivalent to the one at the basis of well-known statistical inference algorithms for community detection. The only difference in the interpretation is that a noisy channel replaces a stochastic network model. However, the different perspective gives the opportunity to take advantage of the rich set of tools of coding theory to generate novel insights on the problem of community detection. In this paper, we illustrate two main applications of standard coding-theoretical methods to community detection. First, we leverage a state-of-the-art decoding technique to generate a family of quasioptimal community detection algorithms. Second and more important, we show that the Shannon's noisy-channel coding theorem can be invoked to establish a lower bound, here named as decodability bound, for the maximum amount of noise tolerable by an ideal decoder to achieve perfect detection of communities. When computed for well-established synthetic benchmarks, the decodability bound explains accurately the performance achieved by the best community detection algorithms existing on the market, telling us that only little room for their improvement is still potentially left.

  10. Detection of algorithmic trading

    NASA Astrophysics Data System (ADS)

    Bogoev, Dimitar; Karam, Arzé

    2017-10-01

    We develop a new approach to reflect the behavior of algorithmic traders. Specifically, we provide an analytical and tractable way to infer patterns of quote volatility and price momentum consistent with different types of strategies employed by algorithmic traders, and we propose two ratios to quantify these patterns. Quote volatility ratio is based on the rate of oscillation of the best ask and best bid quotes over an extremely short period of time; whereas price momentum ratio is based on identifying patterns of rapid upward or downward movement in prices. The two ratios are evaluated across several asset classes. We further run a two-stage Artificial Neural Network experiment on the quote volatility ratio; the first stage is used to detect the quote volatility patterns resulting from algorithmic activity, while the second is used to validate the quality of signal detection provided by our measure.

  11. Unifying Inference of Meso-Scale Structures in Networks.

    PubMed

    Tunç, Birkan; Verma, Ragini

    2015-01-01

    Networks are among the most prevalent formal representations in scientific studies, employed to depict interactions between objects such as molecules, neuronal clusters, or social groups. Studies performed at meso-scale that involve grouping of objects based on their distinctive interaction patterns form one of the main lines of investigation in network science. In a social network, for instance, meso-scale structures can correspond to isolated social groupings or groups of individuals that serve as a communication core. Currently, the research on different meso-scale structures such as community and core-periphery structures has been conducted via independent approaches, which precludes the possibility of an algorithmic design that can handle multiple meso-scale structures and deciding which structure explains the observed data better. In this study, we propose a unified formulation for the algorithmic detection and analysis of different meso-scale structures. This facilitates the investigation of hybrid structures that capture the interplay between multiple meso-scale structures and statistical comparison of competing structures, all of which have been hitherto unavailable. We demonstrate the applicability of the methodology in analyzing the human brain network, by determining the dominant organizational structure (communities) of the brain, as well as its auxiliary characteristics (core-periphery).

  12. Perturbation Biology: Inferring Signaling Networks in Cellular Systems

    PubMed Central

    Miller, Martin L.; Gauthier, Nicholas P.; Jing, Xiaohong; Kaushik, Poorvi; He, Qin; Mills, Gordon; Solit, David B.; Pratilas, Christine A.; Weigt, Martin; Braunstein, Alfredo; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2013-01-01

    We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, Belief Propagation (BP), which is three orders of magnitude faster than standard Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in SKMEL-133 melanoma cell lines, which are resistant to the therapeutically important inhibitor of RAF kinase. The resulting network models reproduce and extend known pathway biology. They empower potential discoveries of new molecular interactions and predict efficacious novel drug perturbations, such as the inhibition of PLK1, which is verified experimentally. This technology is suitable for application to larger systems in diverse areas of molecular biology. PMID:24367245

  13. Image-Data Compression Using Edge-Optimizing Algorithm for WFA Inference.

    ERIC Educational Resources Information Center

    Culik, Karel II; Kari, Jarkko

    1994-01-01

    Presents an inference algorithm that produces a weighted finite automata (WFA), in particular, the grayness functions of graytone images. Image-data compression results based on the new inference algorithm produces a WFA with a relatively small number of edges. Image-data compression results alone and in combination with wavelets are discussed.…

  14. Multi-modality image fusion based on enhanced fuzzy radial basis function neural networks.

    PubMed

    Chao, Zhen; Kim, Dohyeon; Kim, Hee-Joung

    2018-04-01

    In clinical applications, single modality images do not provide sufficient diagnostic information. Therefore, it is necessary to combine the advantages or complementarities of different modalities of images. Recently, neural network technique was applied to medical image fusion by many researchers, but there are still many deficiencies. In this study, we propose a novel fusion method to combine multi-modality medical images based on the enhanced fuzzy radial basis function neural network (Fuzzy-RBFNN), which includes five layers: input, fuzzy partition, front combination, inference, and output. Moreover, we propose a hybrid of the gravitational search algorithm (GSA) and error back propagation algorithm (EBPA) to train the network to update the parameters of the network. Two different patterns of images are used as inputs of the neural network, and the output is the fused image. A comparison with the conventional fusion methods and another neural network method through subjective observation and objective evaluation indexes reveals that the proposed method effectively synthesized the information of input images and achieved better results. Meanwhile, we also trained the network by using the EBPA and GSA, individually. The results reveal that the EBPGSA not only outperformed both EBPA and GSA, but also trained the neural network more accurately by analyzing the same evaluation indexes. Copyright © 2018 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  15. A Self-Organizing State-Space-Model Approach for Parameter Estimation in Hodgkin-Huxley-Type Models of Single Neurons

    PubMed Central

    Vavoulis, Dimitrios V.; Straub, Volko A.; Aston, John A. D.; Feng, Jianfeng

    2012-01-01

    Traditional approaches to the problem of parameter estimation in biophysical models of neurons and neural networks usually adopt a global search algorithm (for example, an evolutionary algorithm), often in combination with a local search method (such as gradient descent) in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conductance-based models of single neurons from a different perspective. By adopting a hidden-dynamical-systems formalism, we expressed parameter estimation as an inference problem in these systems, which can then be tackled using a range of well-established statistical inference methods. The particular method we used was Kitagawa's self-organizing state-space model, which was applied on a number of Hodgkin-Huxley-type models using simulated or actual electrophysiological data. We showed that the algorithm can be used to estimate a large number of parameters, including maximal conductances, reversal potentials, kinetics of ionic currents, measurement and intrinsic noise, based on low-dimensional experimental data and sufficiently informative priors in the form of pre-defined constraints imposed on model parameters. The algorithm remained operational even when very noisy experimental data were used. Importantly, by combining the self-organizing state-space model with an adaptive sampling algorithm akin to the Covariance Matrix Adaptation Evolution Strategy, we achieved a significant reduction in the variance of parameter estimates. The algorithm did not require the explicit formulation of a cost function and it was straightforward to apply on compartmental models and multiple data sets. Overall, the proposed methodology is particularly suitable for resolving high-dimensional inference problems based on noisy electrophysiological data and, therefore, a potentially useful tool in the construction of biophysical neuron models. PMID:22396632

  16. Finding user personal interests by tweet-mining using advanced machine learning algorithm in R

    NASA Astrophysics Data System (ADS)

    Krithika, L. B.; Roy, P.; Asha Jerlin, M.

    2017-11-01

    The social-media plays a key role in every individual’s life by anyone’s personal views about their liking-ness/disliking-ness. This methodology is a sharp departure from the traditional techniques of inferring interests of a user from the tweets that he/she posts or receives. It is showed that the topics of interest inferred by the proposed methodology are far superior than the topics extracted by state-of-the-art techniques such as using topic models (Labelled LDA) on tweets. Based upon the proposed methodology, a system has been built, “Who is interested in what”, which can infer the interests of millions of Twitter users. A novel mechanism is proposed to infer topics of interest of individual users in the twitter social network. It has been observed that in twitter, a user generally follows experts on various topics of his/her interest in order to acquire information on those topics. A methodology based on social annotations is used to first deduce the topical expertise of popular twitter users and then transitively infer the interests of the users who follow them.

  17. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

    PubMed

    Ni, Jingchao; Koyuturk, Mehmet; Tong, Hanghang; Haines, Jonathan; Xu, Rong; Zhang, Xiang

    2016-11-10

    Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http://nijingchao.github.io/CRstar/ .

  18. A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation

    NASA Astrophysics Data System (ADS)

    Tahmasebi, Pejman; Hezarkhani, Ardeshir

    2012-05-01

    The grade estimation is a quite important and money/time-consuming stage in a mine project, which is considered as a challenge for the geologists and mining engineers due to the structural complexities in mineral ore deposits. To overcome this problem, several artificial intelligence techniques such as Artificial Neural Networks (ANN) and Fuzzy Logic (FL) have recently been employed with various architectures and properties. However, due to the constraints of both methods, they yield the desired results only under the specific circumstances. As an example, one major problem in FL is the difficulty of constructing the membership functions (MFs).Other problems such as architecture and local minima could also be located in ANN designing. Therefore, a new methodology is presented in this paper for grade estimation. This method which is based on ANN and FL is called "Coactive Neuro-Fuzzy Inference System" (CANFIS) which combines two approaches, ANN and FL. The combination of these two artificial intelligence approaches is achieved via the verbal and numerical power of intelligent systems. To improve the performance of this system, a Genetic Algorithm (GA) - as a well-known technique to solve the complex optimization problems - is also employed to optimize the network parameters including learning rate, momentum of the network and the number of MFs for each input. A comparison of these techniques (ANN, Adaptive Neuro-Fuzzy Inference System or ANFIS) with this new method (CANFIS-GA) is also carried out through a case study in Sungun copper deposit, located in East-Azerbaijan, Iran. The results show that CANFIS-GA could be a faster and more accurate alternative to the existing time-consuming methodologies for ore grade estimation and that is, therefore, suggested to be applied for grade estimation in similar problems.

  19. A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation

    PubMed Central

    Tahmasebi, Pejman; Hezarkhani, Ardeshir

    2012-01-01

    The grade estimation is a quite important and money/time-consuming stage in a mine project, which is considered as a challenge for the geologists and mining engineers due to the structural complexities in mineral ore deposits. To overcome this problem, several artificial intelligence techniques such as Artificial Neural Networks (ANN) and Fuzzy Logic (FL) have recently been employed with various architectures and properties. However, due to the constraints of both methods, they yield the desired results only under the specific circumstances. As an example, one major problem in FL is the difficulty of constructing the membership functions (MFs).Other problems such as architecture and local minima could also be located in ANN designing. Therefore, a new methodology is presented in this paper for grade estimation. This method which is based on ANN and FL is called “Coactive Neuro-Fuzzy Inference System” (CANFIS) which combines two approaches, ANN and FL. The combination of these two artificial intelligence approaches is achieved via the verbal and numerical power of intelligent systems. To improve the performance of this system, a Genetic Algorithm (GA) – as a well-known technique to solve the complex optimization problems – is also employed to optimize the network parameters including learning rate, momentum of the network and the number of MFs for each input. A comparison of these techniques (ANN, Adaptive Neuro-Fuzzy Inference System or ANFIS) with this new method (CANFIS–GA) is also carried out through a case study in Sungun copper deposit, located in East-Azerbaijan, Iran. The results show that CANFIS–GA could be a faster and more accurate alternative to the existing time-consuming methodologies for ore grade estimation and that is, therefore, suggested to be applied for grade estimation in similar problems. PMID:25540468

  20. Fault gouge evolution during rupture and healing: Continual active-seismic observations across laboratory-scale fault zones

    NASA Astrophysics Data System (ADS)

    Krysta, M.; Kusmierczyk-Michulec, J.; Nikkinen, M.; Carter, J. A.

    2011-12-01

    In order to support its mission of monitoring compliance with the treaty banning nuclear explosions, the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO) operates four global networks of, respectively, seismic, infrasound, hydroacoustic sensors and air samplers accompanied with radionuclide detectors. The role of the International Data Centre (IDC) of CTBTO is to associate the signals detected in the monitoring networks with the physical phenomena which emitted these signals, by forming events. One of the aspects of associating detections with emitters is the problem of inferring the sources of radionuclides from the detections made at CTBTO radionuclide network stations. This task is particularly challenging because the average transport distance between a release point and detectors is large. Complex processes of turbulent diffusion are responsible for efficient mixing and consequently for decreasing the information content of detections with an increasing distance from the source. The problem is generally addressed in a two-step process. In the first step, an atmospheric transport model establishes a link between the detections and the regions of possible source location. In the second step this link is inverted to infer source information from the detections. In this presentation, we will discuss enhancements of the presently used regression-based inversion algorithm to reconstruct a source of radionuclides. To this aim, modern inversion algorithms accounting for prior information and appropriately regularizing an under-determined reconstruction problem will be briefly introduced. Emphasis will be on the CTBTO context and the choice of inversion methods. An illustration of the first tests will be provided using a framework of twin experiments, i.e. fictitious detections in the CTBTO radionuclide network generated with an atmospheric transport model.

  1. Learning Effective Connectivity Network Structure from fMRI Data Based on Artificial Immune Algorithm

    PubMed Central

    Ji, Junzhong; Liu, Jinduo; Liang, Peipeng; Zhang, Aidong

    2016-01-01

    Many approaches have been designed to extract brain effective connectivity from functional magnetic resonance imaging (fMRI) data. However, few of them can effectively identify the connectivity network structure due to different defects. In this paper, a new algorithm is developed to infer the effective connectivity between different brain regions by combining artificial immune algorithm (AIA) with the Bayes net method, named as AIAEC. In the proposed algorithm, a brain effective connectivity network is mapped onto an antibody, and four immune operators are employed to perform the optimization process of antibodies, including clonal selection operator, crossover operator, mutation operator and suppression operator, and finally gets an antibody with the highest K2 score as the solution. AIAEC is then tested on Smith’s simulated datasets, and the effect of the different factors on AIAEC is evaluated, including the node number, session length, as well as the other potential confounding factors of the blood oxygen level dependent (BOLD) signal. It was revealed that, as contrast to other existing methods, AIAEC got the best performance on the majority of the datasets. It was also found that AIAEC could attain a relative better solution under the influence of many factors, although AIAEC was differently affected by the aforementioned factors. AIAEC is thus demonstrated to be an effective method for detecting the brain effective connectivity. PMID:27045295

  2. Combinatorial influence of environmental parameters on transcription factor activity

    PubMed Central

    Knijnenburg, T.A.; Wessels, L.F.A.; Reinders, M.J.T.

    2008-01-01

    Motivation: Cells receive a wide variety of environmental signals, which are often processed combinatorially to generate specific genetic responses. Changes in transcript levels, as observed across different environmental conditions, can, to a large extent, be attributed to changes in the activity of transcription factors (TFs). However, in unraveling these transcription regulation networks, the actual environmental signals are often not incorporated into the model, simply because they have not been measured. The unquantified heterogeneity of the environmental parameters across microarray experiments frustrates regulatory network inference. Results: We propose an inference algorithm that models the influence of environmental parameters on gene expression. The approach is based on a yeast microarray compendium of chemostat steady-state experiments. Chemostat cultivation enables the accurate control and measurement of many of the key cultivation parameters, such as nutrient concentrations, growth rate and temperature. The observed transcript levels are explained by inferring the activity of TFs in response to combinations of cultivation parameters. The interplay between activated enhancers and repressors that bind a gene promoter determine the possible up- or downregulation of the gene. The model is translated into a linear integer optimization problem. The resulting regulatory network identifies the combinatorial effects of environmental parameters on TF activity and gene expression. Availability: The Matlab code is available from the authors upon request. Contact: t.a.knijnenburg@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:18586711

  3. Fetal ECG extraction via Type-2 adaptive neuro-fuzzy inference systems.

    PubMed

    Ahmadieh, Hajar; Asl, Babak Mohammadzadeh

    2017-04-01

    We proposed a noninvasive method for separating the fetal ECG (FECG) from maternal ECG (MECG) by using Type-2 adaptive neuro-fuzzy inference systems. The method can extract FECG components from abdominal signal by using one abdominal channel, including maternal and fetal cardiac signals and other environmental noise signals, and one chest channel. The proposed algorithm detects the nonlinear dynamics of the mother's body. So, the components of the MECG are estimated from the abdominal signal. By subtracting estimated mother cardiac signal from abdominal signal, fetal cardiac signal can be extracted. This algorithm was applied on synthetic ECG signals generated based on the models developed by McSharry et al. and Behar et al. and also on DaISy real database. In environments with high uncertainty, our method performs better than the Type-1 fuzzy method. Specifically, in evaluation of the algorithm with the synthetic data based on McSharry model, for input signals with SNR of -5dB, the SNR of the extracted FECG was improved by 38.38% in comparison with the Type-1 fuzzy method. Also, the results show that increasing the uncertainty or decreasing the input SNR leads to increasing the percentage of the improvement in SNR of the extracted FECG. For instance, when the SNR of the input signal decreases to -30dB, our proposed algorithm improves the SNR of the extracted FECG by 71.06% with respect to the Type-1 fuzzy method. The same results were obtained on synthetic data based on Behar model. Our results on real database reflect the success of the proposed method to separate the maternal and fetal heart signals even if their waves overlap in time. Moreover, the proposed algorithm was applied to the simulated fetal ECG with ectopic beats and achieved good results in separating FECG from MECG. The results show the superiority of the proposed Type-2 neuro-fuzzy inference method over the Type-1 neuro-fuzzy inference and the polynomial networks methods, which is due to its capability to capture the nonlinearities of the model better. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Enlightening discriminative network functional modules behind Principal Component Analysis separation in differential-omic science studies

    PubMed Central

    Ciucci, Sara; Ge, Yan; Durán, Claudio; Palladini, Alessandra; Jiménez-Jiménez, Víctor; Martínez-Sánchez, Luisa María; Wang, Yuting; Sales, Susanne; Shevchenko, Andrej; Poser, Steven W.; Herbig, Maik; Otto, Oliver; Androutsellis-Theotokis, Andreas; Guck, Jochen; Gerl, Mathias J.; Cannistraci, Carlo Vittorio

    2017-01-01

    Omic science is rapidly growing and one of the most employed techniques to explore differential patterns in omic datasets is principal component analysis (PCA). However, a method to enlighten the network of omic features that mostly contribute to the sample separation obtained by PCA is missing. An alternative is to build correlation networks between univariately-selected significant omic features, but this neglects the multivariate unsupervised feature compression responsible for the PCA sample segregation. Biologists and medical researchers often prefer effective methods that offer an immediate interpretation to complicated algorithms that in principle promise an improvement but in practice are difficult to be applied and interpreted. Here we present PC-corr: a simple algorithm that associates to any PCA segregation a discriminative network of features. Such network can be inspected in search of functional modules useful in the definition of combinatorial and multiscale biomarkers from multifaceted omic data in systems and precision biomedicine. We offer proofs of PC-corr efficacy on lipidomic, metagenomic, developmental genomic, population genetic, cancer promoteromic and cancer stem-cell mechanomic data. Finally, PC-corr is a general functional network inference approach that can be easily adopted for big data exploration in computer science and analysis of complex systems in physics. PMID:28287094

  5. A Robust Method to Detect Zero Velocity for Improved 3D Personal Navigation Using Inertial Sensors

    PubMed Central

    Xu, Zhengyi; Wei, Jianming; Zhang, Bo; Yang, Weijun

    2015-01-01

    This paper proposes a robust zero velocity (ZV) detector algorithm to accurately calculate stationary periods in a gait cycle. The proposed algorithm adopts an effective gait cycle segmentation method and introduces a Bayesian network (BN) model based on the measurements of inertial sensors and kinesiology knowledge to infer the ZV period. During the detected ZV period, an Extended Kalman Filter (EKF) is used to estimate the error states and calibrate the position error. The experiments reveal that the removal rate of ZV false detections by the proposed method increases 80% compared with traditional method at high walking speed. Furthermore, based on the detected ZV, the Personal Inertial Navigation System (PINS) algorithm aided by EKF performs better, especially in the altitude aspect. PMID:25831086

  6. Constructing fMRI connectivity networks: a whole brain functional parcellation method for node definition.

    PubMed

    Maggioni, Eleonora; Tana, Maria Gabriella; Arrigoni, Filippo; Zucca, Claudio; Bianchi, Anna Maria

    2014-05-15

    Functional Magnetic Resonance Imaging (fMRI) is used for exploring brain functionality, and recently it was applied for mapping the brain connection patterns. To give a meaningful neurobiological interpretation to the connectivity network, it is fundamental to properly define the network framework. In particular, the choice of the network nodes may affect the final connectivity results and the consequent interpretation. We introduce a novel method for the intra subject topological characterization of the nodes of fMRI brain networks, based on a whole brain parcellation scheme. The proposed whole brain parcellation algorithm divides the brain into clusters that are homogeneous from the anatomical and functional point of view, each of which constitutes a node. The functional parcellation described is based on the Tononi's cluster index, which measures instantaneous correlation in terms of intrinsic and extrinsic statistical dependencies. The method performance and reliability were first tested on simulated data, then on a real fMRI dataset acquired on healthy subjects during visual stimulation. Finally, the proposed algorithm was applied to epileptic patients' fMRI data recorded during seizures, to verify its usefulness as preparatory step for effective connectivity analysis. For each patient, the nodes of the network involved in ictal activity were defined according to the proposed parcellation scheme and Granger Causality Analysis (GCA) was applied to infer effective connectivity. We showed that the algorithm 1) performed well on simulated data, 2) was able to produce reliable inter subjects results and 3) led to a detailed definition of the effective connectivity pattern. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Finding undetected protein associations in cell signaling by belief propagation.

    PubMed

    Bailly-Bechet, M; Borgs, C; Braunstein, A; Chayes, J; Dagkessamanskaia, A; François, J-M; Zecchina, R

    2011-01-11

    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.

  8. Mining Top K Spread Sources for a Specific Topic and a Given Node.

    PubMed

    Liu, Weiwei; Deng, Zhi-Hong; Cao, Longbing; Xu, Xiaoran; Liu, He; Gong, Xiuwen

    2015-11-01

    In social networks, nodes (or users) interested in specific topics are often influenced by others. The influence is usually associated with a set of nodes rather than a single one. An interesting but challenging task for any given topic and node is to find the set of nodes that represents the source or trigger for the topic and thus identify those nodes that have the greatest influence on the given node as the topic spreads. We find that it is an NP-hard problem. This paper proposes an effective framework to deal with this problem. First, the topic propagation is represented as the Bayesian network. We then construct the propagation model by a variant of the voter model. The probability transition matrix (PTM) algorithm is presented to conduct the probability inference with the complexity O(θ(3)log2θ), while θ is the number nodes in the given graph. To evaluate the PTM algorithm, we conduct extensive experiments on real datasets. The experimental results show that the PTM algorithm is both effective and efficient.

  9. Network inference using informative priors.

    PubMed

    Mukherjee, Sach; Speed, Terence P

    2008-09-23

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.

  10. Bayesian inference for dynamic transcriptional regulation; the Hes1 system as a case study.

    PubMed

    Heron, Elizabeth A; Finkenstädt, Bärbel; Rand, David A

    2007-10-01

    In this study, we address the problem of estimating the parameters of regulatory networks and provide the first application of Markov chain Monte Carlo (MCMC) methods to experimental data. As a case study, we consider a stochastic model of the Hes1 system expressed in terms of stochastic differential equations (SDEs) to which rigorous likelihood methods of inference can be applied. When fitting continuous-time stochastic models to discretely observed time series the lengths of the sampling intervals are important, and much of our study addresses the problem when the data are sparse. We estimate the parameters of an autoregulatory network providing results both for simulated and real experimental data from the Hes1 system. We develop an estimation algorithm using MCMC techniques which are flexible enough to allow for the imputation of latent data on a finer time scale and the presence of prior information about parameters which may be informed from other experiments as well as additional measurement error.

  11. Inferring consistent functional interaction patterns from natural stimulus FMRI data

    PubMed Central

    Sun, Jiehuan; Hu, Xintao; Huang, Xiu; Liu, Yang; Li, Kaiming; Li, Xiang; Han, Junwei; Guo, Lei

    2014-01-01

    There has been increasing interest in how the human brain responds to natural stimulus such as video watching in the neuroimaging field. Along this direction, this paper presents our effort in inferring consistent and reproducible functional interaction patterns under natural stimulus of video watching among known functional brain regions identified by task-based fMRI. Then, we applied and compared four statistical approaches, including Bayesian network modeling with searching algorithms: greedy equivalence search (GES), Peter and Clark (PC) analysis, independent multiple greedy equivalence search (IMaGES), and the commonly used Granger causality analysis (GCA), to infer consistent and reproducible functional interaction patterns among these brain regions. It is interesting that a number of reliable and consistent functional interaction patterns were identified by the GES, PC and IMaGES algorithms in different participating subjects when they watched multiple video shots of the same semantic category. These interaction patterns are meaningful given current neuroscience knowledge and are reasonably reproducible across different brains and video shots. In particular, these consistent functional interaction patterns are supported by structural connections derived from diffusion tensor imaging (DTI) data, suggesting the structural underpinnings of consistent functional interactions. Our work demonstrates that specific consistent patterns of functional interactions among relevant brain regions might reflect the brain's fundamental mechanisms of online processing and comprehension of video messages. PMID:22440644

  12. Unsupervised Network Analysis of the Plastic Supraoptic Nucleus Transcriptome Predicts Caprin2 Regulatory Interactions.

    PubMed

    Loh, Su-Yi; Jahans-Price, Thomas; Greenwood, Michael P; Greenwood, Mingkwan; Hoe, See-Ziau; Konopacka, Agnieszka; Campbell, Colin; Murphy, David; Hindmarch, Charles C T

    2017-01-01

    The supraoptic nucleus (SON) is a group of neurons in the hypothalamus responsible for the synthesis and secretion of the peptide hormones vasopressin and oxytocin. Following physiological cues, such as dehydration, salt-loading and lactation, the SON undergoes a function related plasticity that we have previously described in the rat at the transcriptome level. Using the unsupervised graphical lasso (Glasso) algorithm, we reconstructed a putative network from 500 plastic SON genes in which genes are the nodes and the edges are the inferred interactions. The most active nodal gene identified within the network was Caprin2 . Caprin2 encodes an RNA-binding protein that we have previously shown to be vital for the functioning of osmoregulatory neuroendocrine neurons in the SON of the rat hypothalamus. To test the validity of the Glasso network, we either overexpressed or knocked down Caprin2 transcripts in differentiated rat pheochromocytoma PC12 cells and showed that these manipulations had significant opposite effects on the levels of putative target mRNAs. These studies suggest that the predicative power of the Glasso algorithm within an in vivo system is accurate, and identifies biological targets that may be important to the functional plasticity of the SON.

  13. State Space Model with hidden variables for reconstruction of gene regulatory networks.

    PubMed

    Wu, Xi; Li, Peng; Wang, Nan; Gong, Ping; Perkins, Edward J; Deng, Youping; Zhang, Chaoyang

    2011-01-01

    State Space Model (SSM) is a relatively new approach to inferring gene regulatory networks. It requires less computational time than Dynamic Bayesian Networks (DBN). There are two types of variables in the linear SSM, observed variables and hidden variables. SSM uses an iterative method, namely Expectation-Maximization, to infer regulatory relationships from microarray datasets. The hidden variables cannot be directly observed from experiments. How to determine the number of hidden variables has a significant impact on the accuracy of network inference. In this study, we used SSM to infer Gene regulatory networks (GRNs) from synthetic time series datasets, investigated Bayesian Information Criterion (BIC) and Principle Component Analysis (PCA) approaches to determining the number of hidden variables in SSM, and evaluated the performance of SSM in comparison with DBN. True GRNs and synthetic gene expression datasets were generated using GeneNetWeaver. Both DBN and linear SSM were used to infer GRNs from the synthetic datasets. The inferred networks were compared with the true networks. Our results show that inference precision varied with the number of hidden variables. For some regulatory networks, the inference precision of DBN was higher but SSM performed better in other cases. Although the overall performance of the two approaches is compatible, SSM is much faster and capable of inferring much larger networks than DBN. This study provides useful information in handling the hidden variables and improving the inference precision.

  14. Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries Using Sparse Linear Regression

    PubMed Central

    Fisher, Charles K.; Mehta, Pankaj

    2014-01-01

    Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome. PMID:25054627

  15. Gene network inference and visualization tools for biologists: application to new human transcriptome datasets

    PubMed Central

    Hurley, Daniel; Araki, Hiromitsu; Tamada, Yoshinori; Dunmore, Ben; Sanders, Deborah; Humphreys, Sally; Affara, Muna; Imoto, Seiya; Yasuda, Kaori; Tomiyasu, Yuki; Tashiro, Kosuke; Savoie, Christopher; Cho, Vicky; Smith, Stephen; Kuhara, Satoru; Miyano, Satoru; Charnock-Jones, D. Stephen; Crampin, Edmund J.; Print, Cristin G.

    2012-01-01

    Gene regulatory networks inferred from RNA abundance data have generated significant interest, but despite this, gene network approaches are used infrequently and often require input from bioinformaticians. We have assembled a suite of tools for analysing regulatory networks, and we illustrate their use with microarray datasets generated in human endothelial cells. We infer a range of regulatory networks, and based on this analysis discuss the strengths and limitations of network inference from RNA abundance data. We welcome contact from researchers interested in using our inference and visualization tools to answer biological questions. PMID:22121215

  16. Inferring functional connectivity in MRI using Bayesian network structure learning with a modified PC algorithm

    PubMed Central

    Iyer, Swathi; Shafran, Izhak; Grayson, David; Gates, Kathleen; Nigg, Joel; Fair, Damien

    2013-01-01

    Resting state functional connectivity MRI (rs-fcMRI) is a popular technique used to gauge the functional relatedness between regions in the brain for typical and special populations. Most of the work to date determines this relationship by using Pearson's correlation on BOLD fMRI timeseries. However, it has been recognized that there are at least two key limitations to this method. First, it is not possible to resolve the direct and indirect connections/influences. Second, the direction of information flow between the regions cannot be differentiated. In the current paper, we follow-up on recent work by Smith et al (2011), and apply a Bayesian approach called the PC algorithm to both simulated data and empirical data to determine whether these two factors can be discerned with group average, as opposed to single subject, functional connectivity data. When applied on simulated individual subjects, the algorithm performs well determining indirect and direct connection but fails in determining directionality. However, when applied at group level, PC algorithm gives strong results for both indirect and direct connections and the direction of information flow. Applying the algorithm on empirical data, using a diffusion-weighted imaging (DWI) structural connectivity matrix as the baseline, the PC algorithm outperformed the direct correlations. We conclude that, under certain conditions, the PC algorithm leads to an improved estimate of brain network structure compared to the traditional connectivity analysis based on correlations. PMID:23501054

  17. BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities

    PubMed Central

    Chipman, Hugh; Gu, Hong; Bielawski, Joseph P.

    2014-01-01

    Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. Here, we describe a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks), for inferring differential prevalence of metabolic subnetworks among microbial communities. To infer the structure of community-level metabolic interactions, BiomeNet applies a mixed-membership modelling framework to enzyme abundance information. The basic idea is that the mixture components of the model (metabolic reactions, subnetworks, and networks) are shared across all groups (microbiome samples), but the mixture proportions vary from group to group. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems). The metabosystems are composed of mixtures of tightly connected metabolic subnetworks. BiomeNet differs from other unsupervised methods by allowing researchers to discriminate groups of samples through the metabolic patterns it discovers in the data, and by providing a framework for interpreting them. We describe a collapsed Gibbs sampler for inference of the mixture weights under BiomeNet, and we use simulation to validate the inference algorithm. Application of BiomeNet to human gut metagenomes revealed a metabosystem with greater prevalence among inflammatory bowel disease (IBD) patients. Based on the discriminatory subnetworks for this metabosystem, we inferred that the community is likely to be closely associated with the human gut epithelium, resistant to dietary interventions, and interfere with human uptake of an antioxidant connected to IBD. Because this metabosystem has a greater capacity to exploit host-associated glycans, we speculate that IBD-associated communities might arise from opportunist growth of bacteria that can circumvent the host's nutrient-based mechanism for bacterial partner selection. PMID:25412107

  18. Improved Microseismicity Detection During Newberry EGS Stimulations

    DOE Data Explorer

    Templeton, Dennise

    2013-10-01

    Effective enhanced geothermal systems (EGS) require optimal fracture networks for efficient heat transfer between hot rock and fluid. Microseismic mapping is a key tool used to infer the subsurface fracture geometry. Traditional earthquake detection and location techniques are often employed to identify microearthquakes in geothermal regions. However, most commonly used algorithms may miss events if the seismic signal of an earthquake is small relative to the background noise level or if a microearthquake occurs within the coda of a larger event. Consequently, we have developed a set of algorithms that provide improved microearthquake detection. Our objective is to investigate the microseismicity at the DOE Newberry EGS site to better image the active regions of the underground fracture network during and immediately after the EGS stimulation. Detection of more microearthquakes during EGS stimulations will allow for better seismic delineation of the active regions of the underground fracture system. This improved knowledge of the reservoir network will improve our understanding of subsurface conditions, and allow improvement of the stimulation strategy that will optimize heat extraction and maximize economic return.

  19. Improved Microseismicity Detection During Newberry EGS Stimulations

    DOE Data Explorer

    Templeton, Dennise

    2013-11-01

    Effective enhanced geothermal systems (EGS) require optimal fracture networks for efficient heat transfer between hot rock and fluid. Microseismic mapping is a key tool used to infer the subsurface fracture geometry. Traditional earthquake detection and location techniques are often employed to identify microearthquakes in geothermal regions. However, most commonly used algorithms may miss events if the seismic signal of an earthquake is small relative to the background noise level or if a microearthquake occurs within the coda of a larger event. Consequently, we have developed a set of algorithms that provide improved microearthquake detection. Our objective is to investigate the microseismicity at the DOE Newberry EGS site to better image the active regions of the underground fracture network during and immediately after the EGS stimulation. Detection of more microearthquakes during EGS stimulations will allow for better seismic delineation of the active regions of the underground fracture system. This improved knowledge of the reservoir network will improve our understanding of subsurface conditions, and allow improvement of the stimulation strategy that will optimize heat extraction and maximize economic return.

  20. Win-Stay, Lose-Sample: a simple sequential algorithm for approximating Bayesian inference.

    PubMed

    Bonawitz, Elizabeth; Denison, Stephanie; Gopnik, Alison; Griffiths, Thomas L

    2014-11-01

    People can behave in a way that is consistent with Bayesian models of cognition, despite the fact that performing exact Bayesian inference is computationally challenging. What algorithms could people be using to make this possible? We show that a simple sequential algorithm "Win-Stay, Lose-Sample", inspired by the Win-Stay, Lose-Shift (WSLS) principle, can be used to approximate Bayesian inference. We investigate the behavior of adults and preschoolers on two causal learning tasks to test whether people might use a similar algorithm. These studies use a "mini-microgenetic method", investigating how people sequentially update their beliefs as they encounter new evidence. Experiment 1 investigates a deterministic causal learning scenario and Experiments 2 and 3 examine how people make inferences in a stochastic scenario. The behavior of adults and preschoolers in these experiments is consistent with our Bayesian version of the WSLS principle. This algorithm provides both a practical method for performing Bayesian inference and a new way to understand people's judgments. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. GFD-Net: A novel semantic similarity methodology for the analysis of gene networks.

    PubMed

    Díaz-Montaña, Juan J; Díaz-Díaz, Norberto; Gómez-Vela, Francisco

    2017-04-01

    Since the popularization of biological network inference methods, it has become crucial to create methods to validate the resulting models. Here we present GFD-Net, the first methodology that applies the concept of semantic similarity to gene network analysis. GFD-Net combines the concept of semantic similarity with the use of gene network topology to analyze the functional dissimilarity of gene networks based on Gene Ontology (GO). The main innovation of GFD-Net lies in the way that semantic similarity is used to analyze gene networks taking into account the network topology. GFD-Net selects a functionality for each gene (specified by a GO term), weights each edge according to the dissimilarity between the nodes at its ends and calculates a quantitative measure of the network functional dissimilarity, i.e. a quantitative value of the degree of dissimilarity between the connected genes. The robustness of GFD-Net as a gene network validation tool was demonstrated by performing a ROC analysis on several network repositories. Furthermore, a well-known network was analyzed showing that GFD-Net can also be used to infer knowledge. The relevance of GFD-Net becomes more evident in Section "GFD-Net applied to the study of human diseases" where an example of how GFD-Net can be applied to the study of human diseases is presented. GFD-Net is available as an open-source Cytoscape app which offers a user-friendly interface to configure and execute the algorithm as well as the ability to visualize and interact with the results(http://apps.cytoscape.org/apps/gfdnet). Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Excavation of attractor modules for nasopharyngeal carcinoma via integrating systemic module inference with attract method.

    PubMed

    Jiang, T; Jiang, C-Y; Shu, J-H; Xu, Y-J

    2017-07-10

    The molecular mechanism of nasopharyngeal carcinoma (NPC) is poorly understood and effective therapeutic approaches are needed. This research aimed to excavate the attractor modules involved in the progression of NPC and provide further understanding of the underlying mechanism of NPC. Based on the gene expression data of NPC, two specific protein-protein interaction networks for NPC and control conditions were re-weighted using Pearson correlation coefficient. Then, a systematic tracking of candidate modules was conducted on the re-weighted networks via cliques algorithm, and a total of 19 and 38 modules were separately identified from NPC and control networks, respectively. Among them, 8 pairs of modules with similar gene composition were selected, and 2 attractor modules were identified via the attract method. Functional analysis indicated that these two attractor modules participate in one common bioprocess of cell division. Based on the strategy of integrating systemic module inference with the attract method, we successfully identified 2 attractor modules. These attractor modules might play important roles in the molecular pathogenesis of NPC via affecting the bioprocess of cell division in a conjunct way. Further research is needed to explore the correlations between cell division and NPC.

  3. TMA Navigator: network inference, patient stratification and survival analysis with tissue microarray data

    PubMed Central

    Lubbock, Alexander L. R.; Katz, Elad; Harrison, David J.; Overton, Ian M.

    2013-01-01

    Tissue microarrays (TMAs) allow multiplexed analysis of tissue samples and are frequently used to estimate biomarker protein expression in tumour biopsies. TMA Navigator (www.tmanavigator.org) is an open access web application for analysis of TMA data and related information, accommodating categorical, semi-continuous and continuous expression scores. Non-biological variation, or batch effects, can hinder data analysis and may be mitigated using the ComBat algorithm, which is incorporated with enhancements for automated application to TMA data. Unsupervised grouping of samples (patients) is provided according to Gaussian mixture modelling of marker scores, with cardinality selected by Bayesian information criterion regularization. Kaplan–Meier survival analysis is available, including comparison of groups identified by mixture modelling using the Mantel-Cox log-rank test. TMA Navigator also supports network inference approaches useful for TMA datasets, which often constitute comparatively few markers. Tissue and cell-type specific networks derived from TMA expression data offer insights into the molecular logic underlying pathophenotypes, towards more effective and personalized medicine. Output is interactive, and results may be exported for use with external programs. Private anonymous access is available, and user accounts may be generated for easier data management. PMID:23761446

  4. Network inference using informative priors

    PubMed Central

    Mukherjee, Sach; Speed, Terence P.

    2008-01-01

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736

  5. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  6. Detection of Significant Pneumococcal Meningitis Biomarkers by Ego Network.

    PubMed

    Wang, Qian; Lou, Zhifeng; Zhai, Liansuo; Zhao, Haibin

    2017-06-01

    To identify significant biomarkers for detection of pneumococcal meningitis based on ego network. Based on the gene expression data of pneumococcal meningitis and global protein-protein interactions (PPIs) data recruited from open access databases, the authors constructed a differential co-expression network (DCN) to identify pneumococcal meningitis biomarkers in a network view. Here EgoNet algorithm was employed to screen the significant ego networks that could accurately distinguish pneumococcal meningitis from healthy controls, by sequentially seeking ego genes, searching candidate ego networks, refinement of candidate ego networks and significance analysis to identify ego networks. Finally, the functional inference of the ego networks was performed to identify significant pathways for pneumococcal meningitis. By differential co-expression analysis, the authors constructed the DCN that covered 1809 genes and 3689 interactions. From the DCN, a total of 90 ego genes were identified. Starting from these ego genes, three significant ego networks (Module 19, Module 70 and Module 71) that could predict clinical outcomes for pneumococcal meningitis were identified by EgoNet algorithm, and the corresponding ego genes were GMNN, MAD2L1 and TPX2, respectively. Pathway analysis showed that these three ego networks were related to CDT1 association with the CDC6:ORC:origin complex, inactivation of APC/C via direct inhibition of the APC/C complex pathway, and DNA strand elongation, respectively. The authors successfully screened three significant ego modules which could accurately predict the clinical outcomes for pneumococcal meningitis and might play important roles in host response to pathogen infection in pneumococcal meningitis.

  7. Weighted community detection and data clustering using message passing

    NASA Astrophysics Data System (ADS)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  8. Intelligent control for modeling of real-time reservoir operation, part II: artificial neural network with operating rule curves

    NASA Astrophysics Data System (ADS)

    Chang, Ya-Ting; Chang, Li-Chiu; Chang, Fi-John

    2005-04-01

    To bridge the gap between academic research and actual operation, we propose an intelligent control system for reservoir operation. The methodology includes two major processes, the knowledge acquired and implemented, and the inference system. In this study, a genetic algorithm (GA) and a fuzzy rule base (FRB) are used to extract knowledge based on the historical inflow data with a design objective function and on the operating rule curves respectively. The adaptive network-based fuzzy inference system (ANFIS) is then used to implement the knowledge, to create the fuzzy inference system, and then to estimate the optimal reservoir operation. To investigate its applicability and practicability, the Shihmen reservoir, Taiwan, is used as a case study. For the purpose of comparison, a simulation of the currently used M-5 operating rule curve is also performed. The results demonstrate that (1) the GA is an efficient way to search the optimal input-output patterns, (2) the FRB can extract the knowledge from the operating rule curves, and (3) the ANFIS models built on different types of knowledge can produce much better performance than the traditional M-5 curves in real-time reservoir operation. Moreover, we show that the model can be more intelligent for reservoir operation if more information (or knowledge) is involved.

  9. Parametric inference for biological sequence analysis.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  10. Sequential estimation of intrinsic activity and synaptic input in single neurons by particle filtering with optimal importance density

    NASA Astrophysics Data System (ADS)

    Closas, Pau; Guillamon, Antoni

    2017-12-01

    This paper deals with the problem of inferring the signals and parameters that cause neural activity to occur. The ultimate challenge being to unveil brain's connectivity, here we focus on a microscopic vision of the problem, where single neurons (potentially connected to a network of peers) are at the core of our study. The sole observation available are noisy, sampled voltage traces obtained from intracellular recordings. We design algorithms and inference methods using the tools provided by stochastic filtering that allow a probabilistic interpretation and treatment of the problem. Using particle filtering, we are able to reconstruct traces of voltages and estimate the time course of auxiliary variables. By extending the algorithm, through PMCMC methodology, we are able to estimate hidden physiological parameters as well, like intrinsic conductances or reversal potentials. Last, but not least, the method is applied to estimate synaptic conductances arriving at a target cell, thus reconstructing the synaptic excitatory/inhibitory input traces. Notably, the performance of these estimations achieve the theoretical lower bounds even in spiking regimes.

  11. Integration of ANFIS, NN and GA to determine core porosity and permeability from conventional well log data

    NASA Astrophysics Data System (ADS)

    Ja'fari, Ahmad; Hamidzadeh Moghadam, Rasoul

    2012-10-01

    Routine core analysis provides useful information for petrophysical study of the hydrocarbon reservoirs. Effective porosity and fluid conductivity (permeability) could be obtained from core analysis in laboratory. Coring hydrocarbon bearing intervals and analysis of obtained cores in laboratory is expensive and time consuming. In this study an improved method to make a quantitative correlation between porosity and permeability obtained from core and conventional well log data by integration of different artificial intelligent systems is proposed. The proposed method combines the results of adaptive neuro-fuzzy inference system (ANFIS) and neural network (NN) algorithms for overall estimation of core data from conventional well log data. These methods multiply the output of each algorithm with a weight factor. Simple averaging and weighted averaging were used for determining the weight factors. In the weighted averaging method the genetic algorithm (GA) is used to determine the weight factors. The overall algorithm was applied in one of SW Iran’s oil fields with two cored wells. One-third of all data were used as the test dataset and the rest of them were used for training the networks. Results show that the output of the GA averaging method provided the best mean square error and also the best correlation coefficient with real core data.

  12. Cascading Failures in Networks: Inference, Intervention and Robustness to WMDs

    DTIC Science & Technology

    2016-08-01

    model  is  posited,  and  different  cascade  eventualities  are   investigated),  this  proposal  aimed  to  focus  on  the   inverse  problem  and...theory  and  algorithms  for  an  “ inverse  problem”  or  “data-­ driven”  study  of  cascades  –  specifically,  learning  about  how  they  start

  13. Accurate HLA type inference using a weighted similarity graph.

    PubMed

    Xie, Minzhu; Li, Jing; Jiang, Tao

    2010-12-14

    The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true. In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate, achieving an accuracy of 96% for gene HLA-A, 95% for HLA-B, 97% for HLA-C, 84% for HLA-DRB1, 98% for HLA-DQA1 and 97% for HLA-DQB1 in a leave-one-out test. Our algorithm can infer HLA gene types from neighboring SNP genotype data accurately. Compared with a recent approach on the same input data, our algorithm achieved a higher accuracy. The code of our algorithm is available to the public for free upon request to the corresponding authors.

  14. Density-based clustering: A 'landscape view' of multi-channel neural data for inference and dynamic complexity analysis.

    PubMed

    Baglietto, Gabriel; Gigante, Guido; Del Giudice, Paolo

    2017-01-01

    Two, partially interwoven, hot topics in the analysis and statistical modeling of neural data, are the development of efficient and informative representations of the time series derived from multiple neural recordings, and the extraction of information about the connectivity structure of the underlying neural network from the recorded neural activities. In the present paper we show that state-space clustering can provide an easy and effective option for reducing the dimensionality of multiple neural time series, that it can improve inference of synaptic couplings from neural activities, and that it can also allow the construction of a compact representation of the multi-dimensional dynamics, that easily lends itself to complexity measures. We apply a variant of the 'mean-shift' algorithm to perform state-space clustering, and validate it on an Hopfield network in the glassy phase, in which metastable states are largely uncorrelated from memories embedded in the synaptic matrix. In this context, we show that the neural states identified as clusters' centroids offer a parsimonious parametrization of the synaptic matrix, which allows a significant improvement in inferring the synaptic couplings from the neural activities. Moving to the more realistic case of a multi-modular spiking network, with spike-frequency adaptation inducing history-dependent effects, we propose a procedure inspired by Boltzmann learning, but extending its domain of application, to learn inter-module synaptic couplings so that the spiking network reproduces a prescribed pattern of spatial correlations; we then illustrate, in the spiking network, how clustering is effective in extracting relevant features of the network's state-space landscape. Finally, we show that the knowledge of the cluster structure allows casting the multi-dimensional neural dynamics in the form of a symbolic dynamics of transitions between clusters; as an illustration of the potential of such reduction, we define and analyze a measure of complexity of the neural time series.

  15. Mining the modular structure of protein interaction networks.

    PubMed

    Berenstein, Ariel José; Piñero, Janet; Furlong, Laura Inés; Chernomoretz, Ariel

    2015-01-01

    Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis. We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns. As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.

  16. Network Analysis of Rodent Transcriptomes in Spaceflight

    NASA Technical Reports Server (NTRS)

    Ramachandran, Maya; Fogle, Homer; Costes, Sylvain

    2017-01-01

    Network analysis methods leverage prior knowledge of cellular systems and the statistical and conceptual relationships between analyte measurements to determine gene connectivity. Correlation and conditional metrics are used to infer a network topology and provide a systems-level context for cellular responses. Integration across multiple experimental conditions and omics domains can reveal the regulatory mechanisms that underlie gene expression. GeneLab has assembled rich multi-omic (transcriptomics, proteomics, epigenomics, and epitranscriptomics) datasets for multiple murine tissues from the Rodent Research 1 (RR-1) experiment. RR-1 assesses the impact of 37 days of spaceflight on gene expression across a variety of tissue types, such as adrenal glands, quadriceps, gastrocnemius, tibalius anterior, extensor digitorum longus, soleus, eye, and kidney. Network analysis is particularly useful for RR-1 -omics datasets because it reinforces subtle relationships that may be overlooked in isolated analyses and subdues confounding factors. Our objective is to use network analysis to determine potential target nodes for therapeutic intervention and identify similarities with existing disease models. Multiple network algorithms are used for a higher confidence consensus.

  17. Object-oriented feature-tracking algorithms for SAR images of the marginal ice zone

    NASA Technical Reports Server (NTRS)

    Daida, Jason; Samadani, Ramin; Vesecky, John F.

    1990-01-01

    An unsupervised method that chooses and applies the most appropriate tracking algorithm from among different sea-ice tracking algorithms is reported. In contrast to current unsupervised methods, this method chooses and applies an algorithm by partially examining a sequential image pair to draw inferences about what was examined. Based on these inferences the reported method subsequently chooses which algorithm to apply to specific areas of the image pair where that algorithm should work best.

  18. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

    PubMed

    Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset

    2017-01-06

    In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.

  19. Using Bayesian Networks for Candidate Generation in Consistency-based Diagnosis

    NASA Technical Reports Server (NTRS)

    Narasimhan, Sriram; Mengshoel, Ole

    2008-01-01

    Consistency-based diagnosis relies heavily on the assumption that discrepancies between model predictions and sensor observations can be detected accurately. When sources of uncertainty like sensor noise and model abstraction exist robust schemes have to be designed to make a binary decision on whether predictions are consistent with observations. This risks the occurrence of false alarms and missed alarms when an erroneous decision is made. Moreover when multiple sensors (with differing sensing properties) are available the degree of match between predictions and observations can be used to guide the search for fault candidates. In this paper we propose a novel approach to handle this problem using Bayesian networks. In the consistency- based diagnosis formulation, automatically generated Bayesian networks are used to encode a probabilistic measure of fit between predictions and observations. A Bayesian network inference algorithm is used to compute most probable fault candidates.

  20. Redrawing the map of Great Britain from a network of human interactions.

    PubMed

    Ratti, Carlo; Sobolevsky, Stanislav; Calabrese, Francesco; Andris, Clio; Reades, Jonathan; Martino, Mauro; Claxton, Rob; Strogatz, Steven H

    2010-12-08

    Do regional boundaries defined by governments respect the more natural ways that people interact across space? This paper proposes a novel, fine-grained approach to regional delineation, based on analyzing networks of billions of individual human transactions. Given a geographical area and some measure of the strength of links between its inhabitants, we show how to partition the area into smaller, non-overlapping regions while minimizing the disruption to each person's links. We tested our method on the largest non-Internet human network, inferred from a large telecommunications database in Great Britain. Our partitioning algorithm yields geographically cohesive regions that correspond remarkably well with administrative regions, while unveiling unexpected spatial structures that had previously only been hypothesized in the literature. We also quantify the effects of partitioning, showing for instance that the effects of a possible secession of Wales from Great Britain would be twice as disruptive for the human network than that of Scotland.

  1. Overarching framework for data-based modelling

    NASA Astrophysics Data System (ADS)

    Schelter, Björn; Mader, Malenka; Mader, Wolfgang; Sommerlade, Linda; Platt, Bettina; Lai, Ying-Cheng; Grebogi, Celso; Thiel, Marco

    2014-02-01

    One of the main modelling paradigms for complex physical systems are networks. When estimating the network structure from measured signals, typically several assumptions such as stationarity are made in the estimation process. Violating these assumptions renders standard analysis techniques fruitless. We here propose a framework to estimate the network structure from measurements of arbitrary non-linear, non-stationary, stochastic processes. To this end, we propose a rigorous mathematical theory that underlies this framework. Based on this theory, we present a highly efficient algorithm and the corresponding statistics that are immediately sensibly applicable to measured signals. We demonstrate its performance in a simulation study. In experiments of transitions between vigilance stages in rodents, we infer small network structures with complex, time-dependent interactions; this suggests biomarkers for such transitions, the key to understand and diagnose numerous diseases such as dementia. We argue that the suggested framework combines features that other approaches followed so far lack.

  2. Performance Optimization Control of ECH using Fuzzy Inference Application

    NASA Astrophysics Data System (ADS)

    Dubey, Abhay Kumar

    Electro-chemical honing (ECH) is a hybrid electrolytic precision micro-finishing technology that, by combining physico-chemical actions of electro-chemical machining and conventional honing processes, provides the controlled functional surfaces-generation and fast material removal capabilities in a single operation. Process multi-performance optimization has become vital for utilizing full potential of manufacturing processes to meet the challenging requirements being placed on the surface quality, size, tolerances and production rate of engineering components in this globally competitive scenario. This paper presents an strategy that integrates the Taguchi matrix experimental design, analysis of variances and fuzzy inference system (FIS) to formulate a robust practical multi-performance optimization methodology for complex manufacturing processes like ECH, which involve several control variables. Two methodologies one using a genetic algorithm tuning of FIS (GA-tuned FIS) and another using an adaptive network based fuzzy inference system (ANFIS) have been evaluated for a multi-performance optimization case study of ECH. The actual experimental results confirm their potential for a wide range of machining conditions employed in ECH.

  3. Predicting drug-target interactions by dual-network integrated logistic matrix factorization

    NASA Astrophysics Data System (ADS)

    Hao, Ming; Bryant, Stephen H.; Wang, Yanli

    2017-01-01

    In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.

  4. High-Reproducibility and High-Accuracy Method for Automated Topic Classification

    NASA Astrophysics Data System (ADS)

    Lancichinetti, Andrea; Sirer, M. Irmak; Wang, Jane X.; Acuna, Daniel; Körding, Konrad; Amaral, Luís A. Nunes

    2015-01-01

    Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent searching, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.

  5. Semantic modeling and structural synthesis of onboard electronics protection means as open information system

    NASA Astrophysics Data System (ADS)

    Zhevnerchuk, D. V.; Surkova, A. S.; Lomakina, L. S.; Golubev, A. S.

    2018-05-01

    The article describes the component representation approach and semantic models of on-board electronics protection from ionizing radiation of various nature. Semantic models are constructed, the feature of which is the representation of electronic elements, protection modules, sources of impact in the form of blocks with interfaces. The rules of logical inference and algorithms for synthesizing the object properties of the semantic network, imitating the interface between the components of the protection system and the sources of radiation, are developed. The results of the algorithm are considered using the example of radiation-resistant microcircuits 1645RU5U, 1645RT2U and the calculation and experimental method for estimating the durability of on-board electronics.

  6. Information Theory, Inference and Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Mackay, David J. C.

    2003-10-01

    Information theory and inference, often taught separately, are here united in one entertaining textbook. These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning.

  7. Inferring Phylogenetic Networks Using PhyloNet.

    PubMed

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  8. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

    PubMed

    Wu, Yufeng

    2012-03-01

    Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.

  9. Genetic network inference as a series of discrimination tasks.

    PubMed

    Kimura, Shuhei; Nakayama, Satoshi; Hatakeyama, Mariko

    2009-04-01

    Genetic network inference methods based on sets of differential equations generally require a great deal of time, as the equations must be solved many times. To reduce the computational cost, researchers have proposed other methods for inferring genetic networks by solving sets of differential equations only a few times, or even without solving them at all. When we try to obtain reasonable network models using these methods, however, we must estimate the time derivatives of the gene expression levels with great precision. In this study, we propose a new method to overcome the drawbacks of inference methods based on sets of differential equations. Our method infers genetic networks by obtaining classifiers capable of predicting the signs of the derivatives of the gene expression levels. For this purpose, we defined a genetic network inference problem as a series of discrimination tasks, then solved the defined series of discrimination tasks with a linear programming machine. Our experimental results demonstrated that the proposed method is capable of correctly inferring genetic networks, and doing so more than 500 times faster than the other inference methods based on sets of differential equations. Next, we applied our method to actual expression data of the bacterial SOS DNA repair system. And finally, we demonstrated that our approach relates to the inference method based on the S-system model. Though our method provides no estimation of the kinetic parameters, it should be useful for researchers interested only in the network structure of a target system. Supplementary data are available at Bioinformatics online.

  10. Unsupervised Network Analysis of the Plastic Supraoptic Nucleus Transcriptome Predicts Caprin2 Regulatory Interactions

    PubMed Central

    Jahans-Price, Thomas; Greenwood, Michael P.; Greenwood, Mingkwan; Hoe, See-Ziau; Konopacka, Agnieszka

    2017-01-01

    Abstract The supraoptic nucleus (SON) is a group of neurons in the hypothalamus responsible for the synthesis and secretion of the peptide hormones vasopressin and oxytocin. Following physiological cues, such as dehydration, salt-loading and lactation, the SON undergoes a function related plasticity that we have previously described in the rat at the transcriptome level. Using the unsupervised graphical lasso (Glasso) algorithm, we reconstructed a putative network from 500 plastic SON genes in which genes are the nodes and the edges are the inferred interactions. The most active nodal gene identified within the network was Caprin2. Caprin2 encodes an RNA-binding protein that we have previously shown to be vital for the functioning of osmoregulatory neuroendocrine neurons in the SON of the rat hypothalamus. To test the validity of the Glasso network, we either overexpressed or knocked down Caprin2 transcripts in differentiated rat pheochromocytoma PC12 cells and showed that these manipulations had significant opposite effects on the levels of putative target mRNAs. These studies suggest that the predicative power of the Glasso algorithm within an in vivo system is accurate, and identifies biological targets that may be important to the functional plasticity of the SON. PMID:29279858

  11. Use seismic colored inversion and power law committee machine based on imperial competitive algorithm for improving porosity prediction in a heterogeneous reservoir

    NASA Astrophysics Data System (ADS)

    Ansari, Hamid Reza

    2014-09-01

    In this paper we propose a new method for predicting rock porosity based on a combination of several artificial intelligence systems. The method focuses on one of the Iranian carbonate fields in the Persian Gulf. Because there is strong heterogeneity in carbonate formations, estimation of rock properties experiences more challenge than sandstone. For this purpose, seismic colored inversion (SCI) and a new approach of committee machine are used in order to improve porosity estimation. The study comprises three major steps. First, a series of sample-based attributes is calculated from 3D seismic volume. Acoustic impedance is an important attribute that is obtained by the SCI method in this study. Second, porosity log is predicted from seismic attributes using common intelligent computation systems including: probabilistic neural network (PNN), radial basis function network (RBFN), multi-layer feed forward network (MLFN), ε-support vector regression (ε-SVR) and adaptive neuro-fuzzy inference system (ANFIS). Finally, a power law committee machine (PLCM) is constructed based on imperial competitive algorithm (ICA) to combine the results of all previous predictions in a single solution. This technique is called PLCM-ICA in this paper. The results show that PLCM-ICA model improved the results of neural networks, support vector machine and neuro-fuzzy system.

  12. River suspended sediment estimation by climatic variables implication: Comparative study among soft computing techniques

    NASA Astrophysics Data System (ADS)

    Kisi, Ozgur; Shiri, Jalal

    2012-06-01

    Estimating sediment volume carried by a river is an important issue in water resources engineering. This paper compares the accuracy of three different soft computing methods, Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Gene Expression Programming (GEP), in estimating daily suspended sediment concentration on rivers by using hydro-meteorological data. The daily rainfall, streamflow and suspended sediment concentration data from Eel River near Dos Rios, at California, USA are used as a case study. The comparison results indicate that the GEP model performs better than the other models in daily suspended sediment concentration estimation for the particular data sets used in this study. Levenberg-Marquardt, conjugate gradient and gradient descent training algorithms were used for the ANN models. Out of three algorithms, the Conjugate gradient algorithm was found to be better than the others.

  13. Parallel mutual information estimation for inferring gene regulatory networks on GPUs

    PubMed Central

    2011-01-01

    Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:21672264

  14. Inference of Spatio-Temporal Functions Over Graphs via Multikernel Kriged Kalman Filtering

    NASA Astrophysics Data System (ADS)

    Ioannidis, Vassilis N.; Romero, Daniel; Giannakis, Georgios B.

    2018-06-01

    Inference of space-time varying signals on graphs emerges naturally in a plethora of network science related applications. A frequently encountered challenge pertains to reconstructing such dynamic processes, given their values over a subset of vertices and time instants. The present paper develops a graph-aware kernel-based kriged Kalman filter that accounts for the spatio-temporal variations, and offers efficient online reconstruction, even for dynamically evolving network topologies. The kernel-based learning framework bypasses the need for statistical information by capitalizing on the smoothness that graph signals exhibit with respect to the underlying graph. To address the challenge of selecting the appropriate kernel, the proposed filter is combined with a multi-kernel selection module. Such a data-driven method selects a kernel attuned to the signal dynamics on-the-fly within the linear span of a pre-selected dictionary. The novel multi-kernel learning algorithm exploits the eigenstructure of Laplacian kernel matrices to reduce computational complexity. Numerical tests with synthetic and real data demonstrate the superior reconstruction performance of the novel approach relative to state-of-the-art alternatives.

  15. Inferring causal molecular networks: empirical assessment through a community-based effort

    PubMed Central

    Hill, Steven M.; Heiser, Laura M.; Cokelaer, Thomas; Unger, Michael; Nesser, Nicole K.; Carlin, Daniel E.; Zhang, Yang; Sokolov, Artem; Paull, Evan O.; Wong, Chris K.; Graim, Kiley; Bivol, Adrian; Wang, Haizhou; Zhu, Fan; Afsari, Bahman; Danilova, Ludmila V.; Favorov, Alexander V.; Lee, Wai Shing; Taylor, Dane; Hu, Chenyue W.; Long, Byron L.; Noren, David P.; Bisberg, Alexander J.; Mills, Gordon B.; Gray, Joe W.; Kellen, Michael; Norman, Thea; Friend, Stephen; Qutub, Amina A.; Fertig, Elana J.; Guan, Yuanfang; Song, Mingzhou; Stuart, Joshua M.; Spellman, Paul T.; Koeppl, Heinz; Stolovitzky, Gustavo; Saez-Rodriguez, Julio; Mukherjee, Sach

    2016-01-01

    Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks. PMID:26901648

  16. Qualitative Discovery in Medical Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.

    2000-01-01

    Implication rules have been used in uncertainty reasoning systems to confirm and draw hypotheses or conclusions. However a major bottleneck in developing such systems lies in the elicitation of these rules. This paper empirically examines the performance of evidential inferencing with implication networks generated using a rule induction tool called KAT. KAT utilizes an algorithm for the statistical analysis of empirical case data, and hence reduces the knowledge engineering efforts and biases in subjective implication certainty assignment. The paper describes several experiments in which real-world diagnostic problems were investigated; namely, medical diagnostics. In particular, it attempts to show that: (1) with a limited number of case samples, KAT is capable of inducing implication networks useful for making evidential inferences based on partial observations, and (2) observation driven by a network entropy optimization mechanism is effective in reducing the uncertainty of predicted events.

  17. Method and apparatus for predicting the direction of movement in machine vision

    NASA Technical Reports Server (NTRS)

    Lawton, Teri B. (Inventor)

    1992-01-01

    A computer-simulated cortical network is presented. The network is capable of computing the visibility of shifts in the direction of movement. Additionally, the network can compute the following: (1) the magnitude of the position difference between the test and background patterns; (2) localized contrast differences at different spatial scales analyzed by computing temporal gradients of the difference and sum of the outputs of paired even- and odd-symmetric bandpass filters convolved with the input pattern; and (3) the direction of a test pattern moved relative to a textured background. The direction of movement of an object in the field of view of a robotic vision system is detected in accordance with nonlinear Gabor function algorithms. The movement of objects relative to their background is used to infer the 3-dimensional structure and motion of object surfaces.

  18. Mapping higher-order relations between brain structure and function with embedded vector representations of connectomes.

    PubMed

    Rosenthal, Gideon; Váša, František; Griffa, Alessandra; Hagmann, Patric; Amico, Enrico; Goñi, Joaquín; Avidan, Galia; Sporns, Olaf

    2018-06-05

    Connectomics generates comprehensive maps of brain networks, represented as nodes and their pairwise connections. The functional roles of nodes are defined by their direct and indirect connectivity with the rest of the network. However, the network context is not directly accessible at the level of individual nodes. Similar problems in language processing have been addressed with algorithms such as word2vec that create embeddings of words and their relations in a meaningful low-dimensional vector space. Here we apply this approach to create embedded vector representations of brain networks or connectome embeddings (CE). CE can characterize correspondence relations among brain regions, and can be used to infer links that are lacking from the original structural diffusion imaging, e.g., inter-hemispheric homotopic connections. Moreover, we construct predictive deep models of functional and structural connectivity, and simulate network-wide lesion effects using the face processing system as our application domain. We suggest that CE offers a novel approach to revealing relations between connectome structure and function.

  19. Algorithm of OMA for large-scale orthology inference

    PubMed Central

    Roth, Alexander CJ; Gonnet, Gaston H; Dessimoz, Christophe

    2008-01-01

    Background OMA is a project that aims to identify orthologs within publicly available, complete genomes. With 657 genomes analyzed to date, OMA is one of the largest projects of its kind. Results The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses. Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests. Conclusion OMA contains several novel improvement ideas for orthology inference and provides a unique dataset of large-scale orthology assignments. PMID:19055798

  20. Reproducibility of graph metrics of human brain structural networks.

    PubMed

    Duda, Jeffrey T; Cook, Philip A; Gee, James C

    2014-01-01

    Recent interest in human brain connectivity has led to the application of graph theoretical analysis to human brain structural networks, in particular white matter connectivity inferred from diffusion imaging and fiber tractography. While these methods have been used to study a variety of patient populations, there has been less examination of the reproducibility of these methods. A number of tractography algorithms exist and many of these are known to be sensitive to user-selected parameters. The methods used to derive a connectivity matrix from fiber tractography output may also influence the resulting graph metrics. Here we examine how these algorithm and parameter choices influence the reproducibility of proposed graph metrics on a publicly available test-retest dataset consisting of 21 healthy adults. The dice coefficient is used to examine topological similarity of constant density subgraphs both within and between subjects. Seven graph metrics are examined here: mean clustering coefficient, characteristic path length, largest connected component size, assortativity, global efficiency, local efficiency, and rich club coefficient. The reproducibility of these network summary measures is examined using the intraclass correlation coefficient (ICC). Graph curves are created by treating the graph metrics as functions of a parameter such as graph density. Functional data analysis techniques are used to examine differences in graph measures that result from the choice of fiber tracking algorithm. The graph metrics consistently showed good levels of reproducibility as measured with ICC, with the exception of some instability at low graph density levels. The global and local efficiency measures were the most robust to the choice of fiber tracking algorithm.

  1. Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines

    PubMed Central

    Neftci, Emre O.; Augustine, Charles; Paul, Somnath; Detorakis, Georgios

    2017-01-01

    An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Gradient Back Propagation (BP) rule, often relies on the immediate availability of network-wide information stored with high-precision memory during learning, and precise operations that are difficult to realize in neuromorphic hardware. Remarkably, recent work showed that exact backpropagated gradients are not essential for learning deep representations. Building on these results, we demonstrate an event-driven random BP (eRBP) rule that uses an error-modulated synaptic plasticity for learning deep representations. Using a two-compartment Leaky Integrate & Fire (I&F) neuron, the rule requires only one addition and two comparisons for each synaptic weight, making it very suitable for implementation in digital or mixed-signal neuromorphic hardware. Our results show that using eRBP, deep representations are rapidly learned, achieving classification accuracies on permutation invariant datasets comparable to those obtained in artificial neural network simulations on GPUs, while being robust to neural and synaptic state quantizations during learning. PMID:28680387

  2. Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines.

    PubMed

    Neftci, Emre O; Augustine, Charles; Paul, Somnath; Detorakis, Georgios

    2017-01-01

    An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Gradient Back Propagation (BP) rule, often relies on the immediate availability of network-wide information stored with high-precision memory during learning, and precise operations that are difficult to realize in neuromorphic hardware. Remarkably, recent work showed that exact backpropagated gradients are not essential for learning deep representations. Building on these results, we demonstrate an event-driven random BP (eRBP) rule that uses an error-modulated synaptic plasticity for learning deep representations. Using a two-compartment Leaky Integrate & Fire (I&F) neuron, the rule requires only one addition and two comparisons for each synaptic weight, making it very suitable for implementation in digital or mixed-signal neuromorphic hardware. Our results show that using eRBP, deep representations are rapidly learned, achieving classification accuracies on permutation invariant datasets comparable to those obtained in artificial neural network simulations on GPUs, while being robust to neural and synaptic state quantizations during learning.

  3. Filtering in Hybrid Dynamic Bayesian Networks

    NASA Technical Reports Server (NTRS)

    Andersen, Morten Nonboe; Andersen, Rasmus Orum; Wheeler, Kevin

    2000-01-01

    We implement a 2-time slice dynamic Bayesian network (2T-DBN) framework and make a 1-D state estimation simulation, an extension of the experiment in (v.d. Merwe et al., 2000) and compare different filtering techniques. Furthermore, we demonstrate experimentally that inference in a complex hybrid DBN is possible by simulating fault detection in a watertank system, an extension of the experiment in (Koller & Lerner, 2000) using a hybrid 2T-DBN. In both experiments, we perform approximate inference using standard filtering techniques, Monte Carlo methods and combinations of these. In the watertank simulation, we also demonstrate the use of 'non-strict' Rao-Blackwellisation. We show that the unscented Kalman filter (UKF) and UKF in a particle filtering framework outperform the generic particle filter, the extended Kalman filter (EKF) and EKF in a particle filtering framework with respect to accuracy in terms of estimation RMSE and sensitivity with respect to choice of network structure. Especially we demonstrate the superiority of UKF in a PF framework when our beliefs of how data was generated are wrong. Furthermore, we investigate the influence of data noise in the watertank simulation using UKF and PFUKD and show that the algorithms are more sensitive to changes in the measurement noise level that the process noise level. Theory and implementation is based on (v.d. Merwe et al., 2000).

  4. Plasma and Serum Metabolite Association Networks: Comparability within and between Studies Using NMR and MS Profiling.

    PubMed

    Suarez-Diez, Maria; Adam, Jonathan; Adamski, Jerzy; Chasapi, Styliani A; Luchinat, Claudio; Peters, Annette; Prehn, Cornelia; Santucci, Claudio; Spyridonidis, Alexandros; Spyroulias, Georgios A; Tenori, Leonardo; Wang-Sattler, Rui; Saccenti, Edoardo

    2017-07-07

    Blood is one of the most used biofluids in metabolomics studies, and the serum and plasma fractions are routinely used as a proxy for blood itself. Here we investigated the association networks of an array of 29 metabolites identified and quantified via NMR in the plasma and serum samples of two cohorts of ∼1000 healthy blood donors each. A second study of 377 individuals was used to extract plasma and serum samples from the same individual on which a set of 122 metabolites were detected and quantified using FIA-MS/MS. Four different inference algorithms (ARANCE, CLR, CORR, and PCLRC) were used to obtain consensus networks. The plasma and serum networks obtained from different studies showed different topological properties with the serum network being more connected than the plasma network. On a global level, metabolite association networks from plasma and serum fractions obtained from the same blood sample of healthy people show similar topologies, and at a local level, some differences arise like in the case of amino acids.

  5. Plasma and Serum Metabolite Association Networks: Comparability within and between Studies Using NMR and MS Profiling

    PubMed Central

    2017-01-01

    Blood is one of the most used biofluids in metabolomics studies, and the serum and plasma fractions are routinely used as a proxy for blood itself. Here we investigated the association networks of an array of 29 metabolites identified and quantified via NMR in the plasma and serum samples of two cohorts of ∼1000 healthy blood donors each. A second study of 377 individuals was used to extract plasma and serum samples from the same individual on which a set of 122 metabolites were detected and quantified using FIA–MS/MS. Four different inference algorithms (ARANCE, CLR, CORR, and PCLRC) were used to obtain consensus networks. The plasma and serum networks obtained from different studies showed different topological properties with the serum network being more connected than the plasma network. On a global level, metabolite association networks from plasma and serum fractions obtained from the same blood sample of healthy people show similar topologies, and at a local level, some differences arise like in the case of amino acids. PMID:28517934

  6. An algorithm for direct causal learning of influences on patient outcomes.

    PubMed

    Rathnam, Chandramouli; Lee, Sanghoon; Jiang, Xia

    2017-01-01

    This study aims at developing and introducing a new algorithm, called direct causal learner (DCL), for learning the direct causal influences of a single target. We applied it to both simulated and real clinical and genome wide association study (GWAS) datasets and compared its performance to classic causal learning algorithms. The DCL algorithm learns the causes of a single target from passive data using Bayesian-scoring, instead of using independence checks, and a novel deletion algorithm. We generate 14,400 simulated datasets and measure the number of datasets for which DCL correctly and partially predicts the direct causes. We then compare its performance with the constraint-based path consistency (PC) and conservative PC (CPC) algorithms, the Bayesian-score based fast greedy search (FGS) algorithm, and the partial ancestral graphs algorithm fast causal inference (FCI). In addition, we extend our comparison of all five algorithms to both a real GWAS dataset and real breast cancer datasets over various time-points in order to observe how effective they are at predicting the causal influences of Alzheimer's disease and breast cancer survival. DCL consistently outperforms FGS, PC, CPC, and FCI in discovering the parents of the target for the datasets simulated using a simple network. Overall, DCL predicts significantly more datasets correctly (McNemar's test significance: p<0.0001) than any of the other algorithms for these network types. For example, when assessing overall performance (simple and complex network results combined), DCL correctly predicts approximately 1400 more datasets than the top FGS method, 1600 more datasets than the top CPC method, 4500 more datasets than the top PC method, and 5600 more datasets than the top FCI method. Although FGS did correctly predict more datasets than DCL for the complex networks, and DCL correctly predicted only a few more datasets than CPC for these networks, there is no significant difference in performance between these three algorithms for this network type. However, when we use a more continuous measure of accuracy, we find that all the DCL methods are able to better partially predict more direct causes than FGS and CPC for the complex networks. In addition, DCL consistently had faster runtimes than the other algorithms. In the application to the real datasets, DCL identified rs6784615, located on the NISCH gene, and rs10824310, located on the PRKG1 gene, as direct causes of late onset Alzheimer's disease (LOAD) development. In addition, DCL identified ER category as a direct predictor of breast cancer mortality within 5 years, and HER2 status as a direct predictor of 10-year breast cancer mortality. These predictors have been identified in previous studies to have a direct causal relationship with their respective phenotypes, supporting the predictive power of DCL. When the other algorithms discovered predictors from the real datasets, these predictors were either also found by DCL or could not be supported by previous studies. Our results show that DCL outperforms FGS, PC, CPC, and FCI in almost every case, demonstrating its potential to advance causal learning. Furthermore, our DCL algorithm effectively identifies direct causes in the LOAD and Metabric GWAS datasets, which indicates its potential for clinical applications. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. SABRE: a method for assessing the stability of gene modules in complex tissues and subject populations.

    PubMed

    Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T

    2016-11-14

    Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.

  8. CoGAPS matrix factorization algorithm identifies transcriptional changes in AP-2alpha target genes in feedback from therapeutic inhibition of the EGFR network

    PubMed Central

    Thakar, Manjusha; Howard, Jason D.; Kagohara, Luciane T.; Krigsfeld, Gabriel; Ranaweera, Ruchira S.; Hughes, Robert M.; Perez, Jimena; Jones, Siân; Favorov, Alexander V.; Carey, Jacob; Stein-O'Brien, Genevieve; Gaykalova, Daria A.; Ochs, Michael F.; Chung, Christine H.

    2016-01-01

    Patients with oncogene driven tumors are treated with targeted therapeutics including EGFR inhibitors. Genomic data from The Cancer Genome Atlas (TCGA) demonstrates molecular alterations to EGFR, MAPK, and PI3K pathways in previously untreated tumors. Therefore, this study uses bioinformatics algorithms to delineate interactions resulting from EGFR inhibitor use in cancer cells with these genetic alterations. We modify the HaCaT keratinocyte cell line model to simulate cancer cells with constitutive activation of EGFR, HRAS, and PI3K in a controlled genetic background. We then measure gene expression after treating modified HaCaT cells with gefitinib, afatinib, and cetuximab. The CoGAPS algorithm distinguishes a gene expression signature associated with the anticipated silencing of the EGFR network. It also infers a feedback signature with EGFR gene expression itself increasing in cells that are responsive to EGFR inhibitors. This feedback signature has increased expression of several growth factor receptors regulated by the AP-2 family of transcription factors. The gene expression signatures for AP-2alpha are further correlated with sensitivity to cetuximab treatment in HNSCC cell lines and changes in EGFR expression in HNSCC tumors with low CDKN2A gene expression. In addition, the AP-2alpha gene expression signatures are also associated with inhibition of MEK, PI3K, and mTOR pathways in the Library of Integrated Network-Based Cellular Signatures (LINCS) data. These results suggest that AP-2 transcription factors are activated as feedback from EGFR network inhibition and may mediate EGFR inhibitor resistance. PMID:27650546

  9. Segmentation of the hippocampus by transferring algorithmic knowledge for large cohort processing.

    PubMed

    Thyreau, Benjamin; Sato, Kazunori; Fukuda, Hiroshi; Taki, Yasuyuki

    2018-01-01

    The hippocampus is a particularly interesting target for neuroscience research studies due to its essential role within the human brain. In large human cohort studies, bilateral hippocampal structures are frequently identified and measured to gain insight into human behaviour or genomic variability in neuropsychiatric disorders of interest. Automatic segmentation is performed using various algorithms, with FreeSurfer being a popular option. In this manuscript, we present a method to segment the bilateral hippocampus using a deep-learned appearance model. Deep convolutional neural networks (ConvNets) have shown great success in recent years, due to their ability to learn meaningful features from a mass of training data. Our method relies on the following key novelties: (i) we use a wide and variable training set coming from multiple cohorts (ii) our training labels come in part from the output of the FreeSurfer algorithm, and (iii) we include synthetic data and use a powerful data augmentation scheme. Our method proves to be robust, and it has fast inference (<30s total per subject), with trained model available online (https://github.com/bthyreau/hippodeep). We depict illustrative results and show extensive qualitative and quantitative cohort-wide comparisons with FreeSurfer. Our work demonstrates that deep neural-network methods can easily encode, and even improve, existing anatomical knowledge, even when this knowledge exists in algorithmic form. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Cluster and propensity based approximation of a network

    PubMed Central

    2013-01-01

    Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424

  11. MoCha: Molecular Characterization of Unknown Pathways.

    PubMed

    Lobo, Daniel; Hammelman, Jennifer; Levin, Michael

    2016-04-01

    Automated methods for the reverse-engineering of complex regulatory networks are paving the way for the inference of mechanistic comprehensive models directly from experimental data. These novel methods can infer not only the relations and parameters of the known molecules defined in their input datasets, but also unknown components and pathways identified as necessary by the automated algorithms. Identifying the molecular nature of these unknown components is a crucial step for making testable predictions and experimentally validating the models, yet no specific and efficient tools exist to aid in this process. To this end, we present here MoCha (Molecular Characterization), a tool optimized for the search of unknown proteins and their pathways from a given set of known interacting proteins. MoCha uses the comprehensive dataset of protein-protein interactions provided by the STRING database, which currently includes more than a billion interactions from over 2,000 organisms. MoCha is highly optimized, performing typical searches within seconds. We demonstrate the use of MoCha with the characterization of unknown components from reverse-engineered models from the literature. MoCha is useful for working on network models by hand or as a downstream step of a model inference engine workflow and represents a valuable and efficient tool for the characterization of unknown pathways using known data from thousands of organisms. MoCha and its source code are freely available online under the GPLv3 license.

  12. On the Inference of Functional Circadian Networks Using Granger Causality

    PubMed Central

    Pourzanjani, Arya; Herzog, Erik D.; Petzold, Linda R.

    2015-01-01

    Being able to infer one way direct connections in an oscillatory network such as the suprachiastmatic nucleus (SCN) of the mammalian brain using time series data is difficult but crucial to understanding network dynamics. Although techniques have been developed for inferring networks from time series data, there have been no attempts to adapt these techniques to infer directional connections in oscillatory time series, while accurately distinguishing between direct and indirect connections. In this paper an adaptation of Granger Causality is proposed that allows for inference of circadian networks and oscillatory networks in general called Adaptive Frequency Granger Causality (AFGC). Additionally, an extension of this method is proposed to infer networks with large numbers of cells called LASSO AFGC. The method was validated using simulated data from several different networks. For the smaller networks the method was able to identify all one way direct connections without identifying connections that were not present. For larger networks of up to twenty cells the method shows excellent performance in identifying true and false connections; this is quantified by an area-under-the-curve (AUC) 96.88%. We note that this method like other Granger Causality-based methods, is based on the detection of high frequency signals propagating between cell traces. Thus it requires a relatively high sampling rate and a network that can propagate high frequency signals. PMID:26413748

  13. Final Report: Sampling-Based Algorithms for Estimating Structure in Big Data.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matulef, Kevin Michael

    The purpose of this project was to develop sampling-based algorithms to discover hidden struc- ture in massive data sets. Inferring structure in large data sets is an increasingly common task in many critical national security applications. These data sets come from myriad sources, such as network traffic, sensor data, and data generated by large-scale simulations. They are often so large that traditional data mining techniques are time consuming or even infeasible. To address this problem, we focus on a class of algorithms that do not compute an exact answer, but instead use sampling to compute an approximate answer using fewermore » resources. The particular class of algorithms that we focus on are streaming algorithms , so called because they are designed to handle high-throughput streams of data. Streaming algorithms have only a small amount of working storage - much less than the size of the full data stream - so they must necessarily use sampling to approximate the correct answer. We present two results: * A streaming algorithm called HyperHeadTail , that estimates the degree distribution of a graph (i.e., the distribution of the number of connections for each node in a network). The degree distribution is a fundamental graph property, but prior work on estimating the degree distribution in a streaming setting was impractical for many real-world application. We improve upon prior work by developing an algorithm that can handle streams with repeated edges, and graph structures that evolve over time. * An algorithm for the task of maintaining a weighted subsample of items in a stream, when the items must be sampled according to their weight, and the weights are dynamically changing. To our knowledge, this is the first such algorithm designed for dynamically evolving weights. We expect it may be useful as a building block for other streaming algorithms on dynamic data sets.« less

  14. Extraction of Martian valley networks from digital topography

    NASA Technical Reports Server (NTRS)

    Stepinski, T. F.; Collier, M. L.

    2004-01-01

    We have developed a novel method for delineating valley networks on Mars. The valleys are inferred from digital topography by an autonomous computer algorithm as drainage networks, instead of being manually mapped from images. Individual drainage basins are precisely defined and reconstructed to restore flow continuity disrupted by craters. Drainage networks are extracted from their underlying basins using the contributing area threshold method. We demonstrate that such drainage networks coincide with mapped valley networks verifying that valley networks are indeed drainage systems. Our procedure is capable of delineating and analyzing valley networks with unparalleled speed and consistency. We have applied this method to 28 Noachian locations on Mars exhibiting prominent valley networks. All extracted networks have a planar morphology similar to that of terrestrial river networks. They are characterized by a drainage density of approx.0.1/km, low in comparison to the drainage density of terrestrial river networks. Slopes of "streams" in Martian valley networks decrease downstream at a slower rate than slopes of streams in terrestrial river networks. This analysis, based on a sizable data set of valley networks, reveals that although valley networks have some features pointing to their origin by precipitation-fed runoff erosion, their quantitative characteristics suggest that precipitation intensity and/or longevity of past pluvial climate were inadequate to develop mature drainage basins on Mars.

  15. Influence versus intent for predictive analytics in situation awareness

    NASA Astrophysics Data System (ADS)

    Cui, Biru; Yang, Shanchieh J.; Kadar, Ivan

    2013-05-01

    Predictive analytics in situation awareness requires an element to comprehend and anticipate potential adversary activities that might occur in the future. Most work in high level fusion or predictive analytics utilizes machine learning, pattern mining, Bayesian inference, and decision tree techniques to predict future actions or states. The emergence of social computing in broader contexts has drawn interests in bringing the hypotheses and techniques from social theory to algorithmic and computational settings for predictive analytics. This paper aims at answering the question on how influence and attitude (some interpreted such as intent) of adversarial actors can be formulated and computed algorithmically, as a higher level fusion process to provide predictions of future actions. The challenges in this interdisciplinary endeavor include drawing existing understanding of influence and attitude in both social science and computing fields, as well as the mathematical and computational formulation for the specific context of situation to be analyzed. The study of `influence' has resurfaced in recent years due to the emergence of social networks in the virtualized cyber world. Theoretical analysis and techniques developed in this area are discussed in this paper in the context of predictive analysis. Meanwhile, the notion of intent, or `attitude' using social theory terminologies, is a relatively uncharted area in the computing field. Note that a key objective of predictive analytics is to identify impending/planned attacks so their `impact' and `threat' can be prevented. In this spirit, indirect and direct observables are drawn and derived to infer the influence network and attitude to predict future threats. This work proposes an integrated framework that jointly assesses adversarial actors' influence network and their attitudes as a function of past actions and action outcomes. A preliminary set of algorithms are developed and tested using the Global Terrorism Database (GTD). Our results reveals the benefits to perform joint predictive analytics with both attitude and influence. At the same time, we discover significant challenges in deriving influence and attitude from indirect observables for diverse adversarial behavior. These observations warrant further investigation of optimal use of influence and attitude for predictive analytics, as well as the potential inclusion of other environmental or capability elements for the actors.

  16. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback.

    PubMed

    Orhan, A Emin; Ma, Wei Ji

    2017-07-26

    Animals perform near-optimal probabilistic inference in a wide range of psychophysical tasks. Probabilistic inference requires trial-to-trial representation of the uncertainties associated with task variables and subsequent use of this representation. Previous work has implemented such computations using neural networks with hand-crafted and task-dependent operations. We show that generic neural networks trained with a simple error-based learning rule perform near-optimal probabilistic inference in nine common psychophysical tasks. In a probabilistic categorization task, error-based learning in a generic network simultaneously explains a monkey's learning curve and the evolution of qualitative aspects of its choice behavior. In all tasks, the number of neurons required for a given level of performance grows sublinearly with the input population size, a substantial improvement on previous implementations of probabilistic inference. The trained networks develop a novel sparsity-based probabilistic population code. Our results suggest that probabilistic inference emerges naturally in generic neural networks trained with error-based learning rules.Behavioural tasks often require probability distributions to be inferred about task specific variables. Here, the authors demonstrate that generic neural networks can be trained using a simple error-based learning rule to perform such probabilistic computations efficiently without any need for task specific operations.

  17. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    PubMed

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  18. Efficient Exact Inference With Loss Augmented Objective in Structured Learning.

    PubMed

    Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert

    2016-08-19

    Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.

  19. Joint probabilistic determination of earthquake location and velocity structure: application to local and regional events

    NASA Astrophysics Data System (ADS)

    Beucler, E.; Haugmard, M.; Mocquet, A.

    2016-12-01

    The most widely used inversion schemes to locate earthquakes are based on iterative linearized least-squares algorithms and using an a priori knowledge of the propagation medium. When a small amount of observations is available for moderate events for instance, these methods may lead to large trade-offs between outputs and both the velocity model and the initial set of hypocentral parameters. We present a joint structure-source determination approach using Bayesian inferences. Monte-Carlo continuous samplings, using Markov chains, generate models within a broad range of parameters, distributed according to the unknown posterior distributions. The non-linear exploration of both the seismic structure (velocity and thickness) and the source parameters relies on a fast forward problem using 1-D travel time computations. The a posteriori covariances between parameters (hypocentre depth, origin time and seismic structure among others) are computed and explicitly documented. This method manages to decrease the influence of the surrounding seismic network geometry (sparse and/or azimuthally inhomogeneous) and a too constrained velocity structure by inferring realistic distributions on hypocentral parameters. Our algorithm is successfully used to accurately locate events of the Armorican Massif (western France), which is characterized by moderate and apparently diffuse local seismicity.

  20. Artistic image analysis using graph-based learning approaches.

    PubMed

    Carneiro, Gustavo

    2013-08-01

    We introduce a new methodology for the problem of artistic image analysis, which among other tasks, involves the automatic identification of visual classes present in an art work. In this paper, we advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the novelties of our methodology is the proposed formulation that is a principled way of combining these two similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results compared with the following baseline algorithms: bag of visual words, label propagation, matrix completion, and structural learning. We also show that the proposed approach leads to a more efficient inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary problems), where we show the inference and training running times, and quantitative comparisons with respect to several retrieval and annotation performance measures.

  1. A Framework for Integrating Multiple Biological Networks to Predict MicroRNA-Disease Associations.

    PubMed

    Peng, Wei; Lan, Wei; Yu, Zeng; Wang, Jianxin; Pan, Yi

    2017-03-01

    MicroRNAs have close relationship with human diseases. Therefore, identifying disease related MicroRNAs plays an important role in disease diagnosis, prognosis and therapy. However, designing an effective computational method which can make good use of various biological resources and correctly predict the associations between MicroRNA and disease is still a big challenge. Previous researchers have pointed out that there are complex relationships among microRNAs, diseases and environment factors. There are inter-relationships between microRNAs, diseases or environment factors based on their functional similarity or phenotype similarity or chemical structure similarity and so on. There are also intra-relationships between microRNAs and diseases, microRNAs and environment factors, diseases and environment factors. Moreover, functionally similar microRNAs tend to associate with common diseases and common environment factors. The diseases with similar phenotypes are likely caused by common microRNAs and common environment factors. In this work, we propose a framework namely ThrRWMDE which can integrate these complex relationships to predict microRNA-disease associations. In this framework, microRNA similarity network (MFN), disease similarity network (DSN) and environmental factor similarity network (ESN) are constructed according to certain biological properties. Then, an unbalanced three random walking algorithm is implemented on the three networks so as to obtain information from neighbors in corresponding networks. This algorithm not only can flexibly infer information from different levels of neighbors with respect to the topological and structural differences of the three networks, but also in the course of working the functional information will be transferred from one network to another according to the associations between the nodes in different networks. The results of experiment show that our method achieves better prediction performance than other state-of-the-art methods.

  2. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

    PubMed

    Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie

    2012-09-15

    Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.

  3. Annotation of gene function in citrus using gene expression information and co-expression networks

    PubMed Central

    2014-01-01

    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870

  4. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.

    PubMed

    Meyer, Patrick E; Lafitte, Frédéric; Bontempi, Gianluca

    2008-10-29

    This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.

  5. Financial fluctuations anchored to economic fundamentals: A mesoscopic network approach.

    PubMed

    Sharma, Kiran; Gopalakrishnan, Balagopal; Chakrabarti, Anindya S; Chakraborti, Anirban

    2017-08-14

    We demonstrate the existence of an empirical linkage between nominal financial networks and the underlying economic fundamentals, across countries. We construct the nominal return correlation networks from daily data to encapsulate sector-level dynamics and infer the relative importance of the sectors in the nominal network through measures of centrality and clustering algorithms. Eigenvector centrality robustly identifies the backbone of the minimum spanning tree defined on the return networks as well as the primary cluster in the multidimensional scaling map. We show that the sectors that are relatively large in size, defined with three metrics, viz., market capitalization, revenue and number of employees, constitute the core of the return networks, whereas the periphery is mostly populated by relatively smaller sectors. Therefore, sector-level nominal return dynamics are anchored to the real size effect, which ultimately shapes the optimal portfolios for risk management. Our results are reasonably robust across 27 countries of varying degrees of prosperity and across periods of market turbulence (2008-09) as well as periods of relative calmness (2012-13 and 2015-16).

  6. Markov Task Network: A Framework for Service Composition under Uncertainty in Cyber-Physical Systems.

    PubMed

    Mohammed, Abdul-Wahid; Xu, Yang; Hu, Haixiao; Agyemang, Brighter

    2016-09-21

    In novel collaborative systems, cooperative entities collaborate services to achieve local and global objectives. With the growing pervasiveness of cyber-physical systems, however, such collaboration is hampered by differences in the operations of the cyber and physical objects, and the need for the dynamic formation of collaborative functionality given high-level system goals has become practical. In this paper, we propose a cross-layer automation and management model for cyber-physical systems. This models the dynamic formation of collaborative services pursuing laid-down system goals as an ontology-oriented hierarchical task network. Ontological intelligence provides the semantic technology of this model, and through semantic reasoning, primitive tasks can be dynamically composed from high-level system goals. In dealing with uncertainty, we further propose a novel bridge between hierarchical task networks and Markov logic networks, called the Markov task network. This leverages the efficient inference algorithms of Markov logic networks to reduce both computational and inferential loads in task decomposition. From the results of our experiments, high-precision service composition under uncertainty can be achieved using this approach.

  7. Validating module network learning algorithms using simulated data.

    PubMed

    Michoel, Tom; Maere, Steven; Bonnet, Eric; Joshi, Anagha; Saeys, Yvan; Van den Bulcke, Tim; Van Leemput, Koenraad; van Remortel, Piet; Kuiper, Martin; Marchal, Kathleen; Van de Peer, Yves

    2007-05-03

    In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators. We show that data simulators such as SynTReN are very well suited for the purpose of developing, testing and improving module network algorithms. We used SynTReN data to develop and test an alternative module network learning strategy, which is incorporated in the software package LeMoNe, and we provide evidence that this alternative strategy has several advantages with respect to existing methods.

  8. Network inference from multimodal data: A review of approaches from infectious disease transmission.

    PubMed

    Ray, Bisakha; Ghedin, Elodie; Chunara, Rumi

    2016-12-01

    Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. The Manhattan Frame Model-Manhattan World Inference in the Space of Surface Normals.

    PubMed

    Straub, Julian; Freifeld, Oren; Rosman, Guy; Leonard, John J; Fisher, John W

    2018-01-01

    Objects and structures within man-made environments typically exhibit a high degree of organization in the form of orthogonal and parallel planes. Traditional approaches utilize these regularities via the restrictive, and rather local, Manhattan World (MW) assumption which posits that every plane is perpendicular to one of the axes of a single coordinate system. The aforementioned regularities are especially evident in the surface normal distribution of a scene where they manifest as orthogonally-coupled clusters. This motivates the introduction of the Manhattan-Frame (MF) model which captures the notion of an MW in the surface normals space, the unit sphere, and two probabilistic MF models over this space. First, for a single MF we propose novel real-time MAP inference algorithms, evaluate their performance and their use in drift-free rotation estimation. Second, to capture the complexity of real-world scenes at a global scale, we extend the MF model to a probabilistic mixture of Manhattan Frames (MMF). For MMF inference we propose a simple MAP inference algorithm and an adaptive Markov-Chain Monte-Carlo sampling algorithm with Metropolis-Hastings split/merge moves that let us infer the unknown number of mixture components. We demonstrate the versatility of the MMF model and inference algorithm across several scales of man-made environments.

  10. Functional networks inference from rule-based machine learning models.

    PubMed

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The implementation of our network inference protocol is available at: http://ico2s.org/software/funel.html.

  11. Design of a modified adaptive neuro fuzzy inference system classifier for medical diagnosis of Pima Indians Diabetes

    NASA Astrophysics Data System (ADS)

    Sagir, Abdu Masanawa; Sathasivam, Saratha

    2017-08-01

    Medical diagnosis is the process of determining which disease or medical condition explains a person's determinable signs and symptoms. Diagnosis of most of the diseases is very expensive as many tests are required for predictions. This paper aims to introduce an improved hybrid approach for training the adaptive network based fuzzy inference system with Modified Levenberg-Marquardt algorithm using analytical derivation scheme for computation of Jacobian matrix. The goal is to investigate how certain diseases are affected by patient's characteristics and measurement such as abnormalities or a decision about presence or absence of a disease. To achieve an accurate diagnosis at this complex stage of symptom analysis, the physician may need efficient diagnosis system to classify and predict patient condition by using an adaptive neuro fuzzy inference system (ANFIS) pre-processed by grid partitioning. The proposed hybridised intelligent system was tested with Pima Indian Diabetes dataset obtained from the University of California at Irvine's (UCI) machine learning repository. The proposed method's performance was evaluated based on training and test datasets. In addition, an attempt was done to specify the effectiveness of the performance measuring total accuracy, sensitivity and specificity. In comparison, the proposed method achieves superior performance when compared to conventional ANFIS based gradient descent algorithm and some related existing methods. The software used for the implementation is MATLAB R2014a (version 8.3) and executed in PC Intel Pentium IV E7400 processor with 2.80 GHz speed and 2.0 GB of RAM.

  12. Redrawing the Map of Great Britain from a Network of Human Interactions

    PubMed Central

    Ratti, Carlo; Sobolevsky, Stanislav; Calabrese, Francesco; Andris, Clio; Reades, Jonathan; Martino, Mauro; Claxton, Rob; Strogatz, Steven H.

    2010-01-01

    Do regional boundaries defined by governments respect the more natural ways that people interact across space? This paper proposes a novel, fine-grained approach to regional delineation, based on analyzing networks of billions of individual human transactions. Given a geographical area and some measure of the strength of links between its inhabitants, we show how to partition the area into smaller, non-overlapping regions while minimizing the disruption to each person's links. We tested our method on the largest non-Internet human network, inferred from a large telecommunications database in Great Britain. Our partitioning algorithm yields geographically cohesive regions that correspond remarkably well with administrative regions, while unveiling unexpected spatial structures that had previously only been hypothesized in the literature. We also quantify the effects of partitioning, showing for instance that the effects of a possible secession of Wales from Great Britain would be twice as disruptive for the human network than that of Scotland. PMID:21170390

  13. Convolutional networks for fast, energy-efficient neuromorphic computing

    PubMed Central

    Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.

    2016-01-01

    Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489

  14. Convolutional networks for fast, energy-efficient neuromorphic computing.

    PubMed

    Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S

    2016-10-11

    Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

  15. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  16. Networking—a statistical physics perspective

    NASA Astrophysics Data System (ADS)

    Yeung, Chi Ho; Saad, David

    2013-03-01

    Networking encompasses a variety of tasks related to the communication of information on networks; it has a substantial economic and societal impact on a broad range of areas including transportation systems, wired and wireless communications and a range of Internet applications. As transportation and communication networks become increasingly more complex, the ever increasing demand for congestion control, higher traffic capacity, quality of service, robustness and reduced energy consumption requires new tools and methods to meet these conflicting requirements. The new methodology should serve for gaining better understanding of the properties of networking systems at the macroscopic level, as well as for the development of new principled optimization and management algorithms at the microscopic level. Methods of statistical physics seem best placed to provide new approaches as they have been developed specifically to deal with nonlinear large-scale systems. This review aims at presenting an overview of tools and methods that have been developed within the statistical physics community and that can be readily applied to address the emerging problems in networking. These include diffusion processes, methods from disordered systems and polymer physics, probabilistic inference, which have direct relevance to network routing, file and frequency distribution, the exploration of network structures and vulnerability, and various other practical networking applications.

  17. Metric Ranking of Invariant Networks with Belief Propagation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tao, Changxia; Ge, Yong; Song, Qinbao

    The management of large-scale distributed information systems relies on the effective use and modeling of monitoring data collected at various points in the distributed information systems. A promising approach is to discover invariant relationships among the monitoring data and generate invariant networks, where a node is a monitoring data source (metric) and a link indicates an invariant relationship between two monitoring data. Such an invariant network representation can help system experts to localize and diagnose the system faults by examining those broken invariant relationships and their related metrics, because system faults usually propagate among the monitoring data and eventually leadmore » to some broken invariant relationships. However, at one time, there are usually a lot of broken links (invariant relationships) within an invariant network. Without proper guidance, it is difficult for system experts to manually inspect this large number of broken links. Thus, a critical challenge is how to effectively and efficiently rank metrics (nodes) of invariant networks according to the anomaly levels of metrics. The ranked list of metrics will provide system experts with useful guidance for them to localize and diagnose the system faults. To this end, we propose to model the nodes and the broken links as a Markov Random Field (MRF), and develop an iteration algorithm to infer the anomaly of each node based on belief propagation (BP). Finally, we validate the proposed algorithm on both realworld and synthetic data sets to illustrate its effectiveness.« less

  18. Inferring Time-Varying Network Topologies from Gene Expression Data

    PubMed Central

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster—to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence. PMID:18309363

  19. Inferring time-varying network topologies from gene expression data.

    PubMed

    Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.

  20. A linear programming model for protein inference problem in shotgun proteomics.

    PubMed

    Huang, Ting; He, Zengyou

    2012-11-15

    Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.

  1. Fast inference of interactions in assemblies of stochastic integrate-and-fire neurons from spike recordings.

    PubMed

    Monasson, Remi; Cocco, Simona

    2011-10-01

    We present two Bayesian procedures to infer the interactions and external currents in an assembly of stochastic integrate-and-fire neurons from the recording of their spiking activity. The first procedure is based on the exact calculation of the most likely time courses of the neuron membrane potentials conditioned by the recorded spikes, and is exact for a vanishing noise variance and for an instantaneous synaptic integration. The second procedure takes into account the presence of fluctuations around the most likely time courses of the potentials, and can deal with moderate noise levels. The running time of both procedures is proportional to the number S of spikes multiplied by the squared number N of neurons. The algorithms are validated on synthetic data generated by networks with known couplings and currents. We also reanalyze previously published recordings of the activity of the salamander retina (including from 32 to 40 neurons, and from 65,000 to 170,000 spikes). We study the dependence of the inferred interactions on the membrane leaking time; the differences and similarities with the classical cross-correlation analysis are discussed.

  2. Theoretic derivation of directed acyclic subgraph algorithm and comparisons with message passing algorithm

    NASA Astrophysics Data System (ADS)

    Ha, Jeongmok; Jeong, Hong

    2016-07-01

    This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.

  3. Voltage control on a train system

    DOEpatents

    Gordon, Susanna P.; Evans, John A.

    2004-01-20

    The present invention provides methods for preventing low train voltages and managing interference, thereby improving the efficiency, reliability, and passenger comfort associated with commuter trains. An algorithm implementing neural network technology is used to predict low voltages before they occur. Once voltages are predicted, then multiple trains can be controlled to prevent low voltage events. Further, algorithms for managing inference are presented in the present invention. Different types of interference problems are addressed in the present invention such as "Interference During Acceleration", "Interference Near Station Stops", and "Interference During Delay Recovery." Managing such interference avoids unnecessary brake/acceleration cycles during acceleration, immediately before station stops, and after substantial delays. Algorithms are demonstrated to avoid oscillatory brake/acceleration cycles due to interference and to smooth the trajectories of closely following trains. This is achieved by maintaining sufficient following distances to avoid unnecessary braking/accelerating. These methods generate smooth train trajectories, making for a more comfortable ride, and improve train motor reliability by avoiding unnecessary mode-changes between propulsion and braking. These algorithms can also have a favorable impact on traction power system requirements and energy consumption.

  4. Method of managing interference during delay recovery on a train system

    DOEpatents

    Gordon, Susanna P.; Evans, John A.

    2005-12-27

    The present invention provides methods for preventing low train voltages and managing interference, thereby improving the efficiency, reliability, and passenger comfort associated with commuter trains. An algorithm implementing neural network technology is used to predict low voltages before they occur. Once voltages are predicted, then multiple trains can be controlled to prevent low voltage events. Further, algorithms for managing inference are presented in the present invention. Different types of interference problems are addressed in the present invention such as "Interference During Acceleration", "Interference Near Station Stops", and "Interference During Delay Recovery." Managing such interference avoids unnecessary brake/acceleration cycles during acceleration, immediately before station stops, and after substantial delays. Algorithms are demonstrated to avoid oscillatory brake/acceleration cycles due to interference and to smooth the trajectories of closely following trains. This is achieved by maintaining sufficient following distances to avoid unnecessary braking/accelerating. These methods generate smooth train trajectories, making for a more comfortable ride, and improve train motor reliability by avoiding unnecessary mode-changes between propulsion and braking. These algorithms can also have a favorable impact on traction power system requirements and energy consumption.

  5. Efficient high density train operations

    DOEpatents

    Gordon, Susanna P.; Evans, John A.

    2001-01-01

    The present invention provides methods for preventing low train voltages and managing interference, thereby improving the efficiency, reliability, and passenger comfort associated with commuter trains. An algorithm implementing neural network technology is used to predict low voltages before they occur. Once voltages are predicted, then multiple trains can be controlled to prevent low voltage events. Further, algorithms for managing inference are presented in the present invention. Different types of interference problems are addressed in the present invention such as "Interference. During Acceleration", "Interference Near Station Stops", and "Interference During Delay Recovery." Managing such interference avoids unnecessary brake/acceleration cycles during acceleration, immediately before station stops, and after substantial delays. Algorithms are demonstrated to avoid oscillatory brake/acceleration cycles due to interference and to smooth the trajectories of closely following trains. This is achieved by maintaining sufficient following distances to avoid unnecessary braking/accelerating. These methods generate smooth train trajectories, making for a more comfortable ride, and improve train motor reliability by avoiding unnecessary mode-changes between propulsion and braking. These algorithms can also have a favorable impact on traction power system requirements and energy consumption.

  6. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    PubMed

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  7. Reasoning about Causal Relationships: Inferences on Causal Networks

    PubMed Central

    Rottman, Benjamin Margolin; Hastie, Reid

    2013-01-01

    Over the last decade, a normative framework for making causal inferences, Bayesian Probabilistic Causal Networks, has come to dominate psychological studies of inference based on causal relationships. The following causal networks—[X→Y→Z, X←Y→Z, X→Y←Z]—supply answers for questions like, “Suppose both X and Y occur, what is the probability Z occurs?” or “Suppose you intervene and make Y occur, what is the probability Z occurs?” In this review, we provide a tutorial for how normatively to calculate these inferences. Then, we systematically detail the results of behavioral studies comparing human qualitative and quantitative judgments to the normative calculations for many network structures and for several types of inferences on those networks. Overall, when the normative calculations imply that an inference should increase, judgments usually go up; when calculations imply a decrease, judgments usually go down. However, two systematic deviations appear. First, people’s inferences violate the Markov assumption. For example, when inferring Z from the structure X→Y→Z, people think that X is relevant even when Y completely mediates the relationship between X and Z. Second, even when people’s inferences are directionally consistent with the normative calculations, they are often not as sensitive to the parameters and the structure of the network as they should be. We conclude with a discussion of productive directions for future research. PMID:23544658

  8. Uniformly stable backpropagation algorithm to train a feedforward neural network.

    PubMed

    Rubio, José de Jesús; Angelov, Plamen; Pacheco, Jaime

    2011-03-01

    Neural networks (NNs) have numerous applications to online processes, but the problem of stability is rarely discussed. This is an extremely important issue because, if the stability of a solution is not guaranteed, the equipment that is being used can be damaged, which can also cause serious accidents. It is true that in some research papers this problem has been considered, but this concerns continuous-time NN only. At the same time, there are many systems that are better described in the discrete time domain such as population of animals, the annual expenses in an industry, the interest earned by a bank, or the prediction of the distribution of loads stored every hour in a warehouse. Therefore, it is of paramount importance to consider the stability of the discrete-time NN. This paper makes several important contributions. 1) A theorem is stated and proven which guarantees uniform stability of a general discrete-time system. 2) It is proven that the backpropagation (BP) algorithm with a new time-varying rate is uniformly stable for online identification and the identification error converges to a small zone bounded by the uncertainty. 3) It is proven that the weights' error is bounded by the initial weights' error, i.e., overfitting is eliminated in the proposed algorithm. 4) The BP algorithm is applied to predict the distribution of loads that a transelevator receives from a trailer and places in the deposits in a warehouse every hour, so that the deposits in the warehouse are reserved in advance using the prediction results. 5) The BP algorithm is compared with the recursive least square (RLS) algorithm and with the Takagi-Sugeno type fuzzy inference system in the problem of predicting the distribution of loads in a warehouse, giving that the first and the second are stable and the third is unstable. 6) The BP algorithm is compared with the RLS algorithm and with the Kalman filter algorithm in a synthetic example.

  9. Inferring monopartite projections of bipartite networks: an entropy-based approach

    NASA Astrophysics Data System (ADS)

    Saracco, Fabio; Straka, Mika J.; Di Clemente, Riccardo; Gabrielli, Andrea; Caldarelli, Guido; Squartini, Tiziano

    2017-05-01

    Bipartite networks are currently regarded as providing a major insight into the organization of many real-world systems, unveiling the mechanisms driving the interactions occurring between distinct groups of nodes. One of the most important issues encountered when modeling bipartite networks is devising a way to obtain a (monopartite) projection on the layer of interest, which preserves as much as possible the information encoded into the original bipartite structure. In the present paper we propose an algorithm to obtain statistically-validated projections of bipartite networks, according to which any two nodes sharing a statistically-significant number of neighbors are linked. Since assessing the statistical significance of nodes similarity requires a proper statistical benchmark, here we consider a set of four null models, defined within the exponential random graph framework. Our algorithm outputs a matrix of link-specific p-values, from which a validated projection is straightforwardly obtainable, upon running a multiple hypothesis testing procedure. Finally, we test our method on an economic network (i.e. the countries-products World Trade Web representation) and a social network (i.e. MovieLens, collecting the users’ ratings of a list of movies). In both cases non-trivial communities are detected: while projecting the World Trade Web on the countries layer reveals modules of similarly-industrialized nations, projecting it on the products layer allows communities characterized by an increasing level of complexity to be detected; in the second case, projecting MovieLens on the films layer allows clusters of movies whose affinity cannot be fully accounted for by genre similarity to be individuated.

  10. Inferring explicit weighted consensus networks to represent alternative evolutionary histories

    PubMed Central

    2013-01-01

    Background The advent of molecular biology techniques and constant increase in availability of genetic material have triggered the development of many phylogenetic tree inference methods. However, several reticulate evolution processes, such as horizontal gene transfer and hybridization, have been shown to blur the species evolutionary history by causing discordance among phylogenies inferred from different genes. Methods To tackle this problem, we hereby describe a new method for inferring and representing alternative (reticulate) evolutionary histories of species as an explicit weighted consensus network which can be constructed from a collection of gene trees with or without prior knowledge of the species phylogeny. Results We provide a way of building a weighted phylogenetic network for each of the following reticulation mechanisms: diploid hybridization, intragenic recombination and complete or partial horizontal gene transfer. We successfully tested our method on some synthetic and real datasets to infer the above-mentioned evolutionary events which may have influenced the evolution of many species. Conclusions Our weighted consensus network inference method allows one to infer, visualize and validate statistically major conflicting signals induced by the mechanisms of reticulate evolution. The results provided by the new method can be used to represent the inferred conflicting signals by means of explicit and easy-to-interpret phylogenetic networks. PMID:24359207

  11. Annealed Importance Sampling for Neural Mass Models

    PubMed Central

    Penny, Will; Sengupta, Biswa

    2016-01-01

    Neural Mass Models provide a compact description of the dynamical activity of cell populations in neocortical regions. Moreover, models of regional activity can be connected together into networks, and inferences made about the strength of connections, using M/EEG data and Bayesian inference. To date, however, Bayesian methods have been largely restricted to the Variational Laplace (VL) algorithm which assumes that the posterior distribution is Gaussian and finds model parameters that are only locally optimal. This paper explores the use of Annealed Importance Sampling (AIS) to address these restrictions. We implement AIS using proposals derived from Langevin Monte Carlo (LMC) which uses local gradient and curvature information for efficient exploration of parameter space. In terms of the estimation of Bayes factors, VL and AIS agree about which model is best but report different degrees of belief. Additionally, AIS finds better model parameters and we find evidence of non-Gaussianity in their posterior distribution. PMID:26942606

  12. Reconstruction of Horizontal Plasma Motions at the Photosphere from Intensitygrams: A Comparison Between DeepVel, LCT, FLCT, and CST

    NASA Astrophysics Data System (ADS)

    Tremblay, Benoit; Roudier, Thierry; Rieutord, Michel; Vincent, Alain

    2018-04-01

    Direct measurements of plasma motions in the photosphere are limited to the line-of-sight component of the velocity. Several algorithms have therefore been developed to reconstruct the transverse components from observed continuum images or magnetograms. We compare the space and time averages of horizontal velocity fields in the photosphere inferred from pairs of consecutive intensitygrams by the LCT, FLCT, and CST methods and the DeepVel neural network in order to identify the method that is best suited for generating synthetic observations to be used for data assimilation. The Stein and Nordlund ( Astrophys. J. Lett. 753, L13, 2012) magnetoconvection simulation is used to generate synthetic SDO/HMI intensitygrams and reference flows to train DeepVel. Inferred velocity fields show that DeepVel performs best at subgranular and granular scales and is second only to FLCT at mesogranular and supergranular scales.

  13. Morphological inversion of complex diffusion

    NASA Astrophysics Data System (ADS)

    Nguyen, V. A. T.; Vural, D. C.

    2017-09-01

    Epidemics, neural cascades, power failures, and many other phenomena can be described by a diffusion process on a network. To identify the causal origins of a spread, it is often necessary to identify the triggering initial node. Here, we define a new morphological operator and use it to detect the origin of a diffusive front, given the final state of a complex network. Our method performs better than algorithms based on distance (closeness) and Jordan centrality. More importantly, our method is applicable regardless of the specifics of the forward model, and therefore can be applied to a wide range of systems such as identifying the patient zero in an epidemic, pinpointing the neuron that triggers a cascade, identifying the original malfunction that causes a catastrophic infrastructure failure, and inferring the ancestral species from which a heterogeneous population evolves.

  14. Max-margin weight learning for medical knowledge network.

    PubMed

    Jiang, Jingchi; Xie, Jing; Zhao, Chao; Su, Jia; Guan, Yi; Yu, Qiubin

    2018-03-01

    The application of medical knowledge strongly affects the performance of intelligent diagnosis, and method of learning the weights of medical knowledge plays a substantial role in probabilistic graphical models (PGMs). The purpose of this study is to investigate a discriminative weight-learning method based on a medical knowledge network (MKN). We propose a training model called the maximum margin medical knowledge network (M 3 KN), which is strictly derived for calculating the weight of medical knowledge. Using the definition of a reasonable margin, the weight learning can be transformed into a margin optimization problem. To solve the optimization problem, we adopt a sequential minimal optimization (SMO) algorithm and the clique property of a Markov network. Ultimately, M 3 KN not only incorporates the inference ability of PGMs but also deals with high-dimensional logic knowledge. The experimental results indicate that M 3 KN obtains a higher F-measure score than the maximum likelihood learning algorithm of MKN for both Chinese Electronic Medical Records (CEMRs) and Blood Examination Records (BERs). Furthermore, the proposed approach is obviously superior to some classical machine learning algorithms for medical diagnosis. To adequately manifest the importance of domain knowledge, we numerically verify that the diagnostic accuracy of M 3 KN is gradually improved as the number of learned CEMRs increase, which contain important medical knowledge. Our experimental results show that the proposed method performs reliably for learning the weights of medical knowledge. M 3 KN outperforms other existing methods by achieving an F-measure of 0.731 for CEMRs and 0.4538 for BERs. This further illustrates that M 3 KN can facilitate the investigations of intelligent healthcare. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Pedestrian detection in video surveillance using fully convolutional YOLO neural network

    NASA Astrophysics Data System (ADS)

    Molchanov, V. V.; Vishnyakov, B. V.; Vizilter, Y. V.; Vishnyakova, O. V.; Knyaz, V. A.

    2017-06-01

    More than 80% of video surveillance systems are used for monitoring people. Old human detection algorithms, based on background and foreground modelling, could not even deal with a group of people, to say nothing of a crowd. Recent robust and highly effective pedestrian detection algorithms are a new milestone of video surveillance systems. Based on modern approaches in deep learning, these algorithms produce very discriminative features that can be used for getting robust inference in real visual scenes. They deal with such tasks as distinguishing different persons in a group, overcome problem with sufficient enclosures of human bodies by the foreground, detect various poses of people. In our work we use a new approach which enables to combine detection and classification tasks into one challenge using convolution neural networks. As a start point we choose YOLO CNN, whose authors propose a very efficient way of combining mentioned above tasks by learning a single neural network. This approach showed competitive results with state-of-the-art models such as FAST R-CNN, significantly overcoming them in speed, which allows us to apply it in real time video surveillance and other video monitoring systems. Despite all advantages it suffers from some known drawbacks, related to the fully-connected layers that obstruct applying the CNN to images with different resolution. Also it limits the ability to distinguish small close human figures in groups which is crucial for our tasks since we work with rather low quality images which often include dense small groups of people. In this work we gradually change network architecture to overcome mentioned above problems, train it on a complex pedestrian dataset and finally get the CNN detecting small pedestrians in real scenes.

  16. Fundamentals and Recent Developments in Approximate Bayesian Computation

    PubMed Central

    Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka

    2017-01-01

    Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922

  17. Real-time prediction and gating of respiratory motion in 3D space using extended Kalman filters and Gaussian process regression network

    NASA Astrophysics Data System (ADS)

    Bukhari, W.; Hong, S.-M.

    2016-03-01

    The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the radiation treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting respiratory motion in 3D space and realizing a gating function without pre-specifying a particular phase of the patient’s breathing cycle. The algorithm, named EKF-GPRN+ , first employs an extended Kalman filter (EKF) independently along each coordinate to predict the respiratory motion and then uses a Gaussian process regression network (GPRN) to correct the prediction error of the EKF in 3D space. The GPRN is a nonparametric Bayesian algorithm for modeling input-dependent correlations between the output variables in multi-output regression. Inference in GPRN is intractable and we employ variational inference with mean field approximation to compute an approximate predictive mean and predictive covariance matrix. The approximate predictive mean is used to correct the prediction error of the EKF. The trace of the approximate predictive covariance matrix is utilized to capture the uncertainty in EKF-GPRN+ prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification enables us to pause the treatment beam over such instances. EKF-GPRN+ implements a gating function by using simple calculations based on the trace of the predictive covariance matrix. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPRN+ . The experimental results show that the EKF-GPRN+ algorithm reduces the patient-wise prediction error to 38%, 40% and 40% in root-mean-square, compared to no prediction, at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The EKF-GPRN+ algorithm can further reduce the prediction error by employing the gating function, albeit at the cost of reduced duty cycle. The error reduction allows the clinical target volume to planning target volume (CTV-PTV) margin to be reduced, leading to decreased normal-tissue toxicity and possible dose escalation. The CTV-PTV margin is also evaluated to quantify clinical benefits of EKF-GPRN+ prediction.

  18. Bayesian Inference and Online Learning in Poisson Neuronal Networks.

    PubMed

    Huang, Yanping; Rao, Rajesh P N

    2016-08-01

    Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.

  19. Inferring causal molecular networks: empirical assessment through a community-based effort.

    PubMed

    Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas; Unger, Michael; Nesser, Nicole K; Carlin, Daniel E; Zhang, Yang; Sokolov, Artem; Paull, Evan O; Wong, Chris K; Graim, Kiley; Bivol, Adrian; Wang, Haizhou; Zhu, Fan; Afsari, Bahman; Danilova, Ludmila V; Favorov, Alexander V; Lee, Wai Shing; Taylor, Dane; Hu, Chenyue W; Long, Byron L; Noren, David P; Bisberg, Alexander J; Mills, Gordon B; Gray, Joe W; Kellen, Michael; Norman, Thea; Friend, Stephen; Qutub, Amina A; Fertig, Elana J; Guan, Yuanfang; Song, Mingzhou; Stuart, Joshua M; Spellman, Paul T; Koeppl, Heinz; Stolovitzky, Gustavo; Saez-Rodriguez, Julio; Mukherjee, Sach

    2016-04-01

    It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

  20. Online monitoring of Mezcal fermentation based on redox potential measurements.

    PubMed

    Escalante-Minakata, P; Ibarra-Junquera, V; Rosu, H C; De León-Rodríguez, A; González-García, R

    2009-01-01

    We describe an algorithm for the continuous monitoring of the biomass and ethanol concentrations as well as the growth rate in the Mezcal fermentation process. The algorithm performs its task having available only the online measurements of the redox potential. The procedure combines an artificial neural network (ANN) that relates the redox potential to the ethanol and biomass concentrations with a nonlinear observer-based algorithm that uses the ANN biomass estimations to infer the growth rate of this fermentation process. The results show that the redox potential is a valuable indicator of the metabolic activity of the microorganisms during Mezcal fermentation. In addition, the estimated growth rate can be considered as a direct evidence of the presence of mixed culture growth in the process. Usually, mixtures of microorganisms could be intuitively clear in this kind of processes; however, the total biomass data do not provide definite evidence by themselves. In this paper, the detailed design of the software sensor as well as its experimental application is presented at the laboratory level.

  1. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  2. Inferring Network Controls from Topology Using the Chomp Database

    DTIC Science & Technology

    2015-12-03

    AFRL-AFOSR-VA-TR-2016-0033 INFERRING NETWORK CONTROLS FROM TOPOLOGY USING THE CHOMP DATABASE John Harer DUKE UNIVERSITY Final Report 12/03/2015...INFERRING NETWORK CONTROLS FROM TOPOLOGY USING THE CHOMP DATABASE 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA9550-10-1-0436 5c. PROGRAM ELEMENT NUMBER 6...area of Topological Data Analysis (TDA) and it’s application to dynamical systems. The role of this work in the Complex Networks program is based on

  3. An integrative approach to inferring biologically meaningful gene modules.

    PubMed

    Cho, Ji-Hoon; Wang, Kai; Galas, David J

    2011-07-26

    The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  4. Boolean dynamics of genetic regulatory networks inferred from microarray time series data

    DOE PAGES

    Martin, Shawn; Zhang, Zhaoduo; Martino, Anthony; ...

    2007-01-31

    Methods available for the inference of genetic regulatory networks strive to produce a single network, usually by optimizing some quantity to fit the experimental observations. In this paper we investigate the possibility that multiple networks can be inferred, all resulting in similar dynamics. This idea is motivated by theoretical work which suggests that biological networks are robust and adaptable to change, and that the overall behavior of a genetic regulatory network might be captured in terms of dynamical basins of attraction. We have developed and implemented a method for inferring genetic regulatory networks for time series microarray data. Our methodmore » first clusters and discretizes the gene expression data using k-means and support vector regression. We then enumerate Boolean activation–inhibition networks to match the discretized data. In conclusion, the dynamics of the Boolean networks are examined. We have tested our method on two immunology microarray datasets: an IL-2-stimulated T cell response dataset and a LPS-stimulated macrophage response dataset. In both cases, we discovered that many networks matched the data, and that most of these networks had similar dynamics.« less

  5. Evaluation of artificial time series microarray data for dynamic gene regulatory network inference.

    PubMed

    Xenitidis, P; Seimenis, I; Kakolyris, S; Adamopoulos, A

    2017-08-07

    High-throughput technology like microarrays is widely used in the inference of gene regulatory networks (GRNs). We focused on time series data since we are interested in the dynamics of GRNs and the identification of dynamic networks. We evaluated the amount of information that exists in artificial time series microarray data and the ability of an inference process to produce accurate models based on them. We used dynamic artificial gene regulatory networks in order to create artificial microarray data. Key features that characterize microarray data such as the time separation of directly triggered genes, the percentage of directly triggered genes and the triggering function type were altered in order to reveal the limits that are imposed by the nature of microarray data on the inference process. We examined the effect of various factors on the inference performance such as the network size, the presence of noise in microarray data, and the network sparseness. We used a system theory approach and examined the relationship between the pole placement of the inferred system and the inference performance. We examined the relationship between the inference performance in the time domain and the true system parameter identification. Simulation results indicated that time separation and the percentage of directly triggered genes are crucial factors. Also, network sparseness, the triggering function type and noise in input data affect the inference performance. When two factors were simultaneously varied, it was found that variation of one parameter significantly affects the dynamic response of the other. Crucial factors were also examined using a real GRN and acquired results confirmed simulation findings with artificial data. Different initial conditions were also used as an alternative triggering approach. Relevant results confirmed that the number of datasets constitutes the most significant parameter with regard to the inference performance. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Explicit-Duration Hidden Markov Model Inference of UP-DOWN States from Continuous Signals

    PubMed Central

    McFarland, James M.; Hahn, Thomas T. G.; Mehta, Mayank R.

    2011-01-01

    Neocortical neurons show UP-DOWN state (UDS) oscillations under a variety of conditions. These UDS have been extensively studied because of the insight they can yield into the functioning of cortical networks, and their proposed role in putative memory formation. A key element in these studies is determining the precise duration and timing of the UDS. These states are typically determined from the membrane potential of one or a small number of cells, which is often not sufficient to reliably estimate the state of an ensemble of neocortical neurons. The local field potential (LFP) provides an attractive method for determining the state of a patch of cortex with high spatio-temporal resolution; however current methods for inferring UDS from LFP signals lack the robustness and flexibility to be applicable when UDS properties may vary substantially within and across experiments. Here we present an explicit-duration hidden Markov model (EDHMM) framework that is sufficiently general to allow statistically principled inference of UDS from different types of signals (membrane potential, LFP, EEG), combinations of signals (e.g., multichannel LFP recordings) and signal features over long recordings where substantial non-stationarities are present. Using cortical LFPs recorded from urethane-anesthetized mice, we demonstrate that the proposed method allows robust inference of UDS. To illustrate the flexibility of the algorithm we show that it performs well on EEG recordings as well. We then validate these results using simultaneous recordings of the LFP and membrane potential (MP) of nearby cortical neurons, showing that our method offers significant improvements over standard methods. These results could be useful for determining functional connectivity of different brain regions, as well as understanding network dynamics. PMID:21738730

  7. Modeling Belt-Servomechanism by Chebyshev Functional Recurrent Neuro-Fuzzy Network

    NASA Astrophysics Data System (ADS)

    Huang, Yuan-Ruey; Kang, Yuan; Chu, Ming-Hui; Chang, Yeon-Pun

    A novel Chebyshev functional recurrent neuro-fuzzy (CFRNF) network is developed from a combination of the Takagi-Sugeno-Kang (TSK) fuzzy model and the Chebyshev recurrent neural network (CRNN). The CFRNF network can emulate the nonlinear dynamics of a servomechanism system. The system nonlinearity is addressed by enhancing the input dimensions of the consequent parts in the fuzzy rules due to functional expansion of a Chebyshev polynomial. The back propagation algorithm is used to adjust the parameters of the antecedent membership functions as well as those of consequent functions. To verify the performance of the proposed CFRNF, the experiment of the belt servomechanism is presented in this paper. Both of identification methods of adaptive neural fuzzy inference system (ANFIS) and recurrent neural network (RNN) are also studied for modeling of the belt servomechanism. The analysis and comparison results indicate that CFRNF makes identification of complex nonlinear dynamic systems easier. It is verified that the accuracy and convergence of the CFRNF are superior to those of ANFIS and RNN by the identification results of a belt servomechanism.

  8. Inferring Centrality from Network Snapshots

    PubMed Central

    Shao, Haibin; Mesbahi, Mehran; Li, Dewei; Xi, Yugeng

    2017-01-01

    The topology and dynamics of a complex network shape its functionality. However, the topologies of many large-scale networks are either unavailable or incomplete. Without the explicit knowledge of network topology, we show how the data generated from the network dynamics can be utilised to infer the tempo centrality, which is proposed to quantify the influence of nodes in a consensus network. We show that the tempo centrality can be used to construct an accurate estimate of both the propagation rate of influence exerted on consensus networks and the Kirchhoff index of the underlying graph. Moreover, the tempo centrality also encodes the disturbance rejection of nodes in a consensus network. Our findings provide an approach to infer the performance of a consensus network from its temporal data. PMID:28098166

  9. Inferring Centrality from Network Snapshots

    NASA Astrophysics Data System (ADS)

    Shao, Haibin; Mesbahi, Mehran; Li, Dewei; Xi, Yugeng

    2017-01-01

    The topology and dynamics of a complex network shape its functionality. However, the topologies of many large-scale networks are either unavailable or incomplete. Without the explicit knowledge of network topology, we show how the data generated from the network dynamics can be utilised to infer the tempo centrality, which is proposed to quantify the influence of nodes in a consensus network. We show that the tempo centrality can be used to construct an accurate estimate of both the propagation rate of influence exerted on consensus networks and the Kirchhoff index of the underlying graph. Moreover, the tempo centrality also encodes the disturbance rejection of nodes in a consensus network. Our findings provide an approach to infer the performance of a consensus network from its temporal data.

  10. Poisson-Based Inference for Perturbation Models in Adaptive Spelling Training

    ERIC Educational Resources Information Center

    Baschera, Gian-Marco; Gross, Markus

    2010-01-01

    We present an inference algorithm for perturbation models based on Poisson regression. The algorithm is designed to handle unclassified input with multiple errors described by independent mal-rules. This knowledge representation provides an intelligent tutoring system with local and global information about a student, such as error classification…

  11. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    PubMed Central

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  12. CCLasso: correlation inference for compositional data through Lasso.

    PubMed

    Fang, Huaying; Huang, Chengcheng; Zhao, Hongyu; Deng, Minghua

    2015-10-01

    Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data. In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project. CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3. dengmh@pku.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Optimized face recognition algorithm using radial basis function neural networks and its practical applications.

    PubMed

    Yoo, Sung-Hoon; Oh, Sung-Kwun; Pedrycz, Witold

    2015-09-01

    In this study, we propose a hybrid method of face recognition by using face region information extracted from the detected face region. In the preprocessing part, we develop a hybrid approach based on the Active Shape Model (ASM) and the Principal Component Analysis (PCA) algorithm. At this step, we use a CCD (Charge Coupled Device) camera to acquire a facial image by using AdaBoost and then Histogram Equalization (HE) is employed to improve the quality of the image. ASM extracts the face contour and image shape to produce a personal profile. Then we use a PCA method to reduce dimensionality of face images. In the recognition part, we consider the improved Radial Basis Function Neural Networks (RBF NNs) to identify a unique pattern associated with each person. The proposed RBF NN architecture consists of three functional modules realizing the condition phase, the conclusion phase, and the inference phase completed with the help of fuzzy rules coming in the standard 'if-then' format. In the formation of the condition part of the fuzzy rules, the input space is partitioned with the use of Fuzzy C-Means (FCM) clustering. In the conclusion part of the fuzzy rules, the connections (weights) of the RBF NNs are represented by four kinds of polynomials such as constant, linear, quadratic, and reduced quadratic. The values of the coefficients are determined by running a gradient descent method. The output of the RBF NNs model is obtained by running a fuzzy inference method. The essential design parameters of the network (including learning rate, momentum coefficient and fuzzification coefficient used by the FCM) are optimized by means of Differential Evolution (DE). The proposed P-RBF NNs (Polynomial based RBF NNs) are applied to facial recognition and its performance is quantified from the viewpoint of the output performance and recognition rate. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    PubMed

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  15. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach

    PubMed Central

    Tong, Xin; Zeng, Yao

    2016-01-01

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people’s incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning, to address whether with high probability, a large fraction of people in a given finite population network can make “good” inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows. PMID:27076691

  16. Gene network inference by fusing data from diverse distributions

    PubMed Central

    Žitnik, Marinka; Zupan, Blaž

    2015-01-01

    Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies. Availability and implementation: Source code is at https://github.com/marinkaz/fusenet. Contact: blaz.zupan@fri.uni-lj.si Supplementary information: Supplementary information is available at Bioinformatics online. PMID:26072487

  17. A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI.

    PubMed

    Avendi, M R; Kheradvar, Arash; Jafarkhani, Hamid

    2016-05-01

    Segmentation of the left ventricle (LV) from cardiac magnetic resonance imaging (MRI) datasets is an essential step for calculation of clinical indices such as ventricular volume and ejection fraction. In this work, we employ deep learning algorithms combined with deformable models to develop and evaluate a fully automatic LV segmentation tool from short-axis cardiac MRI datasets. The method employs deep learning algorithms to learn the segmentation task from the ground true data. Convolutional networks are employed to automatically detect the LV chamber in MRI dataset. Stacked autoencoders are used to infer the LV shape. The inferred shape is incorporated into deformable models to improve the accuracy and robustness of the segmentation. We validated our method using 45 cardiac MR datasets from the MICCAI 2009 LV segmentation challenge and showed that it outperforms the state-of-the art methods. Excellent agreement with the ground truth was achieved. Validation metrics, percentage of good contours, Dice metric, average perpendicular distance and conformity, were computed as 96.69%, 0.94, 1.81 mm and 0.86, versus those of 79.2-95.62%, 0.87-0.9, 1.76-2.97 mm and 0.67-0.78, obtained by other methods, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Integration of multi-omics data for integrative gene regulatory network inference.

    PubMed

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun; Kang, Mingon

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.

  19. Integration of multi-omics data for integrative gene regulatory network inference

    PubMed Central

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called ‘multi-omics data’, that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN’s capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed. PMID:29354189

  20. Probabilistic Inference in General Graphical Models through Sampling in Stochastic Networks of Spiking Neurons

    PubMed Central

    Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

    2011-01-01

    An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows (“explaining away”) and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons. PMID:22219717

  1. Inferring pregnancy episodes and outcomes within a network of observational databases

    PubMed Central

    Ryan, Patrick; Fife, Daniel; Gifkins, Dina; Knoll, Chris; Friedman, Andrew

    2018-01-01

    Administrative claims and electronic health records are valuable resources for evaluating pharmaceutical effects during pregnancy. However, direct measures of gestational age are generally not available. Establishing a reliable approach to infer the duration and outcome of a pregnancy could improve pharmacovigilance activities. We developed and applied an algorithm to define pregnancy episodes in four observational databases: three US-based claims databases: Truven MarketScan® Commercial Claims and Encounters (CCAE), Truven MarketScan® Multi-state Medicaid (MDCD), and the Optum ClinFormatics® (Optum) database and one non-US database, the United Kingdom (UK) based Clinical Practice Research Datalink (CPRD). Pregnancy outcomes were classified as live births, stillbirths, abortions and ectopic pregnancies. Start dates were estimated using a derived hierarchy of available pregnancy markers, including records such as last menstrual period and nuchal ultrasound dates. Validation included clinical adjudication of 700 electronic Optum and CPRD pregnancy episode profiles to assess the operating characteristics of the algorithm, and a comparison of the algorithm’s Optum pregnancy start estimates to starts based on dates of assisted conception procedures. Distributions of pregnancy outcome types were similar across all four data sources and pregnancy episode lengths found were as expected for all outcomes, excepting term lengths in episodes that used amenorrhea and urine pregnancy tests for start estimation. Validation survey results found highest agreement between reviewer chosen and algorithm operating characteristics for questions assessing pregnancy status and accuracy of outcome category with 99–100% agreement for Optum and CPRD. Outcome date agreement within seven days in either direction ranged from 95–100%, while start date agreement within seven days in either direction ranged from 90–97%. In Optum validation sensitivity analysis, a total of 73% of algorithm estimated starts for live births were in agreement with fertility procedure estimated starts within two weeks in either direction; ectopic pregnancy 77%, stillbirth 47%, and abortion 36%. An algorithm to infer live birth and ectopic pregnancy episodes and outcomes can be applied to multiple observational databases with acceptable accuracy for further epidemiologic research. Less accuracy was found for start date estimations in stillbirth and abortion outcomes in our sensitivity analysis, which may be expected given the nature of the outcomes. PMID:29389968

  2. Inference of Vohradský's Models of Genetic Networks by Solving Two-Dimensional Function Optimization Problems

    PubMed Central

    Kimura, Shuhei; Sato, Masanao; Okada-Hatakeyama, Mariko

    2013-01-01

    The inference of a genetic network is a problem in which mutual interactions among genes are inferred from time-series of gene expression levels. While a number of models have been proposed to describe genetic networks, this study focuses on a mathematical model proposed by Vohradský. Because of its advantageous features, several researchers have proposed the inference methods based on Vohradský's model. When trying to analyze large-scale networks consisting of dozens of genes, however, these methods must solve high-dimensional non-linear function optimization problems. In order to resolve the difficulty of estimating the parameters of the Vohradský's model, this study proposes a new method that defines the problem as several two-dimensional function optimization problems. Through numerical experiments on artificial genetic network inference problems, we showed that, although the computation time of the proposed method is not the shortest, the method has the ability to estimate parameters of Vohradský's models more effectively with sufficiently short computation times. This study then applied the proposed method to an actual inference problem of the bacterial SOS DNA repair system, and succeeded in finding several reasonable regulations. PMID:24386175

  3. The Role of Probability-Based Inference in an Intelligent Tutoring System.

    ERIC Educational Resources Information Center

    Mislevy, Robert J.; Gitomer, Drew H.

    Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…

  4. Detection of Cheating by Decimation Algorithm

    NASA Astrophysics Data System (ADS)

    Yamanaka, Shogo; Ohzeki, Masayuki; Decelle, Aurélien

    2015-02-01

    We expand the item response theory to study the case of "cheating students" for a set of exams, trying to detect them by applying a greedy algorithm of inference. This extended model is closely related to the Boltzmann machine learning. In this paper we aim to infer the correct biases and interactions of our model by considering a relatively small number of sets of training data. Nevertheless, the greedy algorithm that we employed in the present study exhibits good performance with a few number of training data. The key point is the sparseness of the interactions in our problem in the context of the Boltzmann machine learning: the existence of cheating students is expected to be very rare (possibly even in real world). We compare a standard approach to infer the sparse interactions in the Boltzmann machine learning to our greedy algorithm and we find the latter to be superior in several aspects.

  5. Approximate, computationally efficient online learning in Bayesian spiking neurons.

    PubMed

    Kuhlmann, Levin; Hauser-Raspe, Michael; Manton, Jonathan H; Grayden, David B; Tapson, Jonathan; van Schaik, André

    2014-03-01

    Bayesian spiking neurons (BSNs) provide a probabilistic interpretation of how neurons perform inference and learning. Online learning in BSNs typically involves parameter estimation based on maximum-likelihood expectation-maximization (ML-EM) which is computationally slow and limits the potential of studying networks of BSNs. An online learning algorithm, fast learning (FL), is presented that is more computationally efficient than the benchmark ML-EM for a fixed number of time steps as the number of inputs to a BSN increases (e.g., 16.5 times faster run times for 20 inputs). Although ML-EM appears to converge 2.0 to 3.6 times faster than FL, the computational cost of ML-EM means that ML-EM takes longer to simulate to convergence than FL. FL also provides reasonable convergence performance that is robust to initialization of parameter estimates that are far from the true parameter values. However, parameter estimation depends on the range of true parameter values. Nevertheless, for a physiologically meaningful range of parameter values, FL gives very good average estimation accuracy, despite its approximate nature. The FL algorithm therefore provides an efficient tool, complementary to ML-EM, for exploring BSN networks in more detail in order to better understand their biological relevance. Moreover, the simplicity of the FL algorithm means it can be easily implemented in neuromorphic VLSI such that one can take advantage of the energy-efficient spike coding of BSNs.

  6. Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference

    PubMed Central

    Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.

    2014-01-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology. PMID:24921649

  7. Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference.

    PubMed

    Chen, Guocai; Cairelli, Michael J; Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C

    2014-06-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.

  8. Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms.

    PubMed

    Galbadrakh, Bulgan; Lee, Kyung-Eun; Park, Hyun-Seok

    2012-12-01

    Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

  9. GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

    NASA Astrophysics Data System (ADS)

    Takaishi, Tetsuya

    2015-01-01

    The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran.

  10. Differentiable cortical networks for inferences concerning people’s intentions versus physical causality

    PubMed Central

    Mason, Robert A.; Just, Marcel Adam

    2010-01-01

    Cortical activity associated with generating an inference was measured using fMRI. Participants read three-sentence passages that differed in whether or not an inference needed to be drawn to understand them. The inference was based on either a protagonist’s intention or a physical consequence of a character’s action. Activation was expected in Theory of Mind brain regions for the passages based on protagonists’ intentions but not for the physical consequence passages. The activation measured in the right temporo-parietal junction was greater in the intentional passages than in the consequence passages, consistent with predictions from a Theory of Mind perspective. In contrast, there was increased occipital activation in the physical inference passages. For both types of passage, the cortical activity related to the reading of the critical inference sentence demonstrated a recruitment of a common inference cortical network. This general inference-related activation appeared bilaterally in the language processing areas (the inferior frontal gyrus, the temporal gyrus, and the angular gyrus), as well as in the medial to superior frontal gyrus, which has been found to be active in Theory of Mind tasks. These findings are consistent with the hypothesis that component areas of the discourse processing network are recruited as needed based on the nature of the inference. A Protagonist monitoring and synthesis network is proposed as a more accurate account for Theory of Mind activation during narrative comprehension. PMID:21229617

  11. Social networks help to infer causality in the tumor microenvironment.

    PubMed

    Crespo, Isaac; Doucey, Marie-Agnès; Xenarios, Ioannis

    2016-03-15

    Networks have become a popular way to conceptualize a system of interacting elements, such as electronic circuits, social communication, metabolism or gene regulation. Network inference, analysis, and modeling techniques have been developed in different areas of science and technology, such as computer science, mathematics, physics, and biology, with an active interdisciplinary exchange of concepts and approaches. However, some concepts seem to belong to a specific field without a clear transferability to other domains. At the same time, it is increasingly recognized that within some biological systems--such as the tumor microenvironment--where different types of resident and infiltrating cells interact to carry out their functions, the complexity of the system demands a theoretical framework, such as statistical inference, graph analysis and dynamical models, in order to asses and study the information derived from high-throughput experimental technologies. In this article we propose to adopt and adapt the concepts of influence and investment from the world of social network analysis to biological problems, and in particular to apply this approach to infer causality in the tumor microenvironment. We showed that constructing a bidirectional network of influence between cell and cell communication molecules allowed us to determine the direction of inferred regulations at the expression level and correctly recapitulate cause-effect relationships described in literature. This work constitutes an example of a transfer of knowledge and concepts from the world of social network analysis to biomedical research, in particular to infer network causality in biological networks. This causality elucidation is essential to model the homeostatic response of biological systems to internal and external factors, such as environmental conditions, pathogens or treatments.

  12. Construction of an integrated gene regulatory network link to stress-related immune system in cattle.

    PubMed

    Behdani, Elham; Bakhtiarizadeh, Mohammad Reza

    2017-10-01

    The immune system is an important biological system that is negatively impacted by stress. This study constructed an integrated regulatory network to enhance our understanding of the regulatory gene network used in the stress-related immune system. Module inference was used to construct modules of co-expressed genes with bovine leukocyte RNA-Seq data. Transcription factors (TFs) were then assigned to these modules using Lemon-Tree algorithms. In addition, the TFs assigned to each module were confirmed using the promoter analysis and protein-protein interactions data. Therefore, our integrated method identified three TFs which include one TF that is previously known to be involved in immune response (MYBL2) and two TFs (E2F8 and FOXS1) that had not been recognized previously and were identified for the first time in this study as novel regulatory candidates in immune response. This study provides valuable insights on the regulatory programs of genes involved in the stress-related immune system.

  13. Molecular phylogenetic trees - On the validity of the Goodman-Moore augmentation algorithm

    NASA Technical Reports Server (NTRS)

    Holmquist, R.

    1979-01-01

    A response is made to the reply of Nei and Tateno (1979) to the letter of Holmquist (1978) supporting the validity of the augmentation algorithm of Moore (1977) in reconstructions of nucleotide substitutions by means of the maximum parsimony principle. It is argued that the overestimation of the augmented numbers of nucleotide substitutions (augmented distances) found by Tateno and Nei (1978) is due to an unrepresentative data sample and that it is only necessary that evolution be stochastically uniform in different regions of the phylogenetic network for the augmentation method to be useful. The importance of the average value of the true distance over all links is explained, and the relative variances of the true and augmented distances are calculated to be almost identical. The effects of topological changes in the phylogenetic tree on the augmented distance and the question of the correctness of ancestral sequences inferred by the method of parsimony are also clarified.

  14. Don't Fear Optimality: Sampling for Probabilistic-Logic Sequence Models

    NASA Astrophysics Data System (ADS)

    Thon, Ingo

    One of the current challenges in artificial intelligence is modeling dynamic environments that change due to the actions or activities undertaken by people or agents. The task of inferring hidden states, e.g. the activities or intentions of people, based on observations is called filtering. Standard probabilistic models such as Dynamic Bayesian Networks are able to solve this task efficiently using approximative methods such as particle filters. However, these models do not support logical or relational representations. The key contribution of this paper is the upgrade of a particle filter algorithm for use with a probabilistic logical representation through the definition of a proposal distribution. The performance of the algorithm depends largely on how well this distribution fits the target distribution. We adopt the idea of logical compilation into Binary Decision Diagrams for sampling. This allows us to use the optimal proposal distribution which is normally prohibitively slow.

  15. Hybrid grammar-based approach to nonlinear dynamical system identification from biological time series

    NASA Astrophysics Data System (ADS)

    McKinney, B. A.; Crowe, J. E., Jr.; Voss, H. U.; Crooke, P. S.; Barney, N.; Moore, J. H.

    2006-02-01

    We introduce a grammar-based hybrid approach to reverse engineering nonlinear ordinary differential equation models from observed time series. This hybrid approach combines a genetic algorithm to search the space of model architectures with a Kalman filter to estimate the model parameters. Domain-specific knowledge is used in a context-free grammar to restrict the search space for the functional form of the target model. We find that the hybrid approach outperforms a pure evolutionary algorithm method, and we observe features in the evolution of the dynamical models that correspond with the emergence of favorable model components. We apply the hybrid method to both artificially generated time series and experimentally observed protein levels from subjects who received the smallpox vaccine. From the observed data, we infer a cytokine protein interaction network for an individual’s response to the smallpox vaccine.

  16. Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bates, Cameron Russell; Mckigney, Edward Allen

    The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.

  17. Classification of complex information: inference of co-occurring affective states from their expressions in speech.

    PubMed

    Sobol-Shikler, Tal; Robinson, Peter

    2010-07-01

    We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.

  18. Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks

    PubMed Central

    Roy, Sushmita; Lagree, Stephen; Hou, Zhonggang; Thomson, James A.; Stewart, Ron; Gasch, Audrey P.

    2013-01-01

    Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development. PMID:24146602

  19. Consistency of biological networks inferred from microarray and sequencing data.

    PubMed

    Vinciotti, Veronica; Wit, Ernst C; Jansen, Rick; de Geus, Eco J C N; Penninx, Brenda W J H; Boomsma, Dorret I; 't Hoen, Peter A C

    2016-06-24

    Sparse Gaussian graphical models are popular for inferring biological networks, such as gene regulatory networks. In this paper, we investigate the consistency of these models across different data platforms, such as microarray and next generation sequencing, on the basis of a rich dataset containing samples that are profiled under both techniques as well as a large set of independent samples. Our analysis shows that individual node variances can have a remarkable effect on the connectivity of the resulting network. Their inconsistency across platforms and the fact that the variability level of a node may not be linked to its regulatory role mean that, failing to scale the data prior to the network analysis, leads to networks that are not reproducible across different platforms and that may be misleading. Moreover, we show how the reproducibility of networks across different platforms is significantly higher if networks are summarised in terms of enrichment amongst functional groups of interest, such as pathways, rather than at the level of individual edges. Careful pre-processing of transcriptional data and summaries of networks beyond individual edges can improve the consistency of network inference across platforms. However, caution is needed at this stage in the (over)interpretation of gene regulatory networks inferred from biological data.

  20. Side-by-side ANFIS as a useful tool for estimating correlated thermophysical properties

    NASA Astrophysics Data System (ADS)

    Grieu, Stéphane; Faugeroux, Olivier; Traoré, Adama; Claudet, Bernard; Bodnar, Jean-Luc

    2015-12-01

    In the present paper, an artificial intelligence-based approach dealing with the estimation of correlated thermophysical properties is designed and evaluated. This new and "intelligent" approach makes use of photothermal responses obtained when homogeneous materials are subjected to a light flux. Commonly, gradient-based algorithms are used as parameter estimation techniques. Unfortunately, such algorithms show instabilities leading to non-convergence in case of correlated properties to be estimated from a rebuilt impulse response. So, the main objective of the present work was to simultaneously estimate both the thermal diffusivity and conductivity of homogeneous materials, from front-face or rear-face photothermal responses to pseudo random binary signals. To this end, we used side-by-side neuro-fuzzy systems (adaptive network-based fuzzy inference systems) trained with a hybrid algorithm. We focused on the impact on generalization of both the examples used during training and the fuzzification process. In addition, computation time was a key point to consider. That is why the developed algorithm is computationally tractable and allows both the thermal diffusivity and conductivity of homogeneous materials to be simultaneously estimated with very good accuracy (the generalization error ranges between 4.6% and 6.2%).

  1. An integrative approach to inferring biologically meaningful gene modules

    PubMed Central

    2011-01-01

    Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051

  2. A Component-Based Diffusion Model With Structural Diversity for Social Networks.

    PubMed

    Qing Bao; Cheung, William K; Yu Zhang; Jiming Liu

    2017-04-01

    Diffusion on social networks refers to the process where opinions are spread via the connected nodes. Given a set of observed information cascades, one can infer the underlying diffusion process for social network analysis. The independent cascade model (IC model) is a widely adopted diffusion model where a node is assumed to be activated independently by any one of its neighbors. In reality, how a node will be activated also depends on how its neighbors are connected and activated. For instance, the opinions from the neighbors of the same social group are often similar and thus redundant. In this paper, we extend the IC model by considering that: 1) the information coming from the connected neighbors are similar and 2) the underlying redundancy can be modeled using a dynamic structural diversity measure of the neighbors. Our proposed model assumes each node to be activated independently by different communities (or components) of its parent nodes, each weighted by its effective size. An expectation maximization algorithm is derived to infer the model parameters. We compare the performance of the proposed model with the basic IC model and its variants using both synthetic data sets and a real-world data set containing news stories and Web blogs. Our empirical results show that incorporating the community structure of neighbors and the structural diversity measure into the diffusion model significantly improves the accuracy of the model, at the expense of only a reasonable increase in run-time.

  3. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency.

    PubMed

    Yeh, Hsiang-Yuan; Cheng, Shih-Wu; Lin, Yu-Chun; Yeh, Cheng-Yu; Lin, Shih-Fang; Soo, Von-Wun

    2009-12-21

    Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage. We provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. Our method is helpful in verifying possible interaction relations in gene regulatory networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks. Our method is also validated in several enriched published papers and databases and the significant gene regulatory networks perform critical biological functions and processes including cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction, and GO-annotated processes. Those significant gene regulations and the critical concept of tumor progression are useful to understand cancer biology and disease treatment.

  4. A Network Selection Algorithm Considering Power Consumption in Hybrid Wireless Networks

    NASA Astrophysics Data System (ADS)

    Joe, Inwhee; Kim, Won-Tae; Hong, Seokjoon

    In this paper, we propose a novel network selection algorithm considering power consumption in hybrid wireless networks for vertical handover. CDMA, WiBro, WLAN networks are candidate networks for this selection algorithm. This algorithm is composed of the power consumption prediction algorithm and the final network selection algorithm. The power consumption prediction algorithm estimates the expected lifetime of the mobile station based on the current battery level, traffic class and power consumption for each network interface card of the mobile station. If the expected lifetime of the mobile station in a certain network is not long enough compared the handover delay, this particular network will be removed from the candidate network list, thereby preventing unnecessary handovers in the preprocessing procedure. On the other hand, the final network selection algorithm consists of AHP (Analytic Hierarchical Process) and GRA (Grey Relational Analysis). The global factors of the network selection structure are QoS, cost and lifetime. If user preference is lifetime, our selection algorithm selects the network that offers longest service duration due to low power consumption. Also, we conduct some simulations using the OPNET simulation tool. The simulation results show that the proposed algorithm provides longer lifetime in the hybrid wireless network environment.

  5. Process Mining for Individualized Behavior Modeling Using Wireless Tracking in Nursing Homes

    PubMed Central

    Fernández-Llatas, Carlos; Benedi, José-Miguel; García-Gómez, Juan M.; Traver, Vicente

    2013-01-01

    The analysis of human behavior patterns is increasingly used for several research fields. The individualized modeling of behavior using classical techniques requires too much time and resources to be effective. A possible solution would be the use of pattern recognition techniques to automatically infer models to allow experts to understand individual behavior. However, traditional pattern recognition algorithms infer models that are not readily understood by human experts. This limits the capacity to benefit from the inferred models. Process mining technologies can infer models as workflows, specifically designed to be understood by experts, enabling them to detect specific behavior patterns in users. In this paper, the eMotiva process mining algorithms are presented. These algorithms filter, infer and visualize workflows. The workflows are inferred from the samples produced by an indoor location system that stores the location of a resident in a nursing home. The visualization tool is able to compare and highlight behavior patterns in order to facilitate expert understanding of human behavior. This tool was tested with nine real users that were monitored for a 25-week period. The results achieved suggest that the behavior of users is continuously evolving and changing and that this change can be measured, allowing for behavioral change detection. PMID:24225907

  6. Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

    NASA Astrophysics Data System (ADS)

    Huang, Haiping

    2017-05-01

    Revealing hidden features in unlabeled data is called unsupervised feature learning, which plays an important role in pretraining a deep neural network. Here we provide a statistical mechanics analysis of the unsupervised learning in a restricted Boltzmann machine with binary synapses. A message passing equation to infer the hidden feature is derived, and furthermore, variants of this equation are analyzed. A statistical analysis by replica theory describes the thermodynamic properties of the model. Our analysis confirms an entropy crisis preceding the non-convergence of the message passing equation, suggesting a discontinuous phase transition as a key characteristic of the restricted Boltzmann machine. Continuous phase transition is also confirmed depending on the embedded feature strength in the data. The mean-field result under the replica symmetric assumption agrees with that obtained by running message passing algorithms on single instances of finite sizes. Interestingly, in an approximate Hopfield model, the entropy crisis is absent, and a continuous phase transition is observed instead. We also develop an iterative equation to infer the hyper-parameter (temperature) hidden in the data, which in physics corresponds to iteratively imposing Nishimori condition. Our study provides insights towards understanding the thermodynamic properties of the restricted Boltzmann machine learning, and moreover important theoretical basis to build simplified deep networks.

  7. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

    PubMed Central

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

    2012-01-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606

  8. Extracting neuronal functional network dynamics via adaptive Granger causality analysis.

    PubMed

    Sheikhattar, Alireza; Miran, Sina; Liu, Ji; Fritz, Jonathan B; Shamma, Shihab A; Kanold, Patrick O; Babadi, Behtash

    2018-04-24

    Quantifying the functional relations between the nodes in a network based on local observations is a key challenge in studying complex systems. Most existing time series analysis techniques for this purpose provide static estimates of the network properties, pertain to stationary Gaussian data, or do not take into account the ubiquitous sparsity in the underlying functional networks. When applied to spike recordings from neuronal ensembles undergoing rapid task-dependent dynamics, they thus hinder a precise statistical characterization of the dynamic neuronal functional networks underlying adaptive behavior. We develop a dynamic estimation and inference paradigm for extracting functional neuronal network dynamics in the sense of Granger, by integrating techniques from adaptive filtering, compressed sensing, point process theory, and high-dimensional statistics. We demonstrate the utility of our proposed paradigm through theoretical analysis, algorithm development, and application to synthetic and real data. Application of our techniques to two-photon Ca 2+ imaging experiments from the mouse auditory cortex reveals unique features of the functional neuronal network structures underlying spontaneous activity at unprecedented spatiotemporal resolution. Our analysis of simultaneous recordings from the ferret auditory and prefrontal cortical areas suggests evidence for the role of rapid top-down and bottom-up functional dynamics across these areas involved in robust attentive behavior.

  9. Reasoning and Knowledge Acquisition Framework for 5G Network Analytics

    PubMed Central

    2017-01-01

    Autonomic self-management is a key challenge for next-generation networks. This paper proposes an automated analysis framework to infer knowledge in 5G networks with the aim to understand the network status and to predict potential situations that might disrupt the network operability. The framework is based on the Endsley situational awareness model, and integrates automated capabilities for metrics discovery, pattern recognition, prediction techniques and rule-based reasoning to infer anomalous situations in the current operational context. Those situations should then be mitigated, either proactive or reactively, by a more complex decision-making process. The framework is driven by a use case methodology, where the network administrator is able to customize the knowledge inference rules and operational parameters. The proposal has also been instantiated to prove its adaptability to a real use case. To this end, a reference network traffic dataset was used to identify suspicious patterns and to predict the behavior of the monitored data volume. The preliminary results suggest a good level of accuracy on the inference of anomalous traffic volumes based on a simple configuration. PMID:29065473

  10. Reasoning and Knowledge Acquisition Framework for 5G Network Analytics.

    PubMed

    Sotelo Monge, Marco Antonio; Maestre Vidal, Jorge; García Villalba, Luis Javier

    2017-10-21

    Autonomic self-management is a key challenge for next-generation networks. This paper proposes an automated analysis framework to infer knowledge in 5G networks with the aim to understand the network status and to predict potential situations that might disrupt the network operability. The framework is based on the Endsley situational awareness model, and integrates automated capabilities for metrics discovery, pattern recognition, prediction techniques and rule-based reasoning to infer anomalous situations in the current operational context. Those situations should then be mitigated, either proactive or reactively, by a more complex decision-making process. The framework is driven by a use case methodology, where the network administrator is able to customize the knowledge inference rules and operational parameters. The proposal has also been instantiated to prove its adaptability to a real use case. To this end, a reference network traffic dataset was used to identify suspicious patterns and to predict the behavior of the monitored data volume. The preliminary results suggest a good level of accuracy on the inference of anomalous traffic volumes based on a simple configuration.

  11. Application of dynamic uncertain causality graph in spacecraft fault diagnosis: Logic cycle

    NASA Astrophysics Data System (ADS)

    Yao, Quanying; Zhang, Qin; Liu, Peng; Yang, Ping; Zhu, Ma; Wang, Xiaochen

    2017-04-01

    Intelligent diagnosis system are applied to fault diagnosis in spacecraft. Dynamic Uncertain Causality Graph (DUCG) is a new probability graphic model with many advantages. In the knowledge expression of spacecraft fault diagnosis, feedback among variables is frequently encountered, which may cause directed cyclic graphs (DCGs). Probabilistic graphical models (PGMs) such as bayesian network (BN) have been widely applied in uncertain causality representation and probabilistic reasoning, but BN does not allow DCGs. In this paper, DUGG is applied to fault diagnosis in spacecraft: introducing the inference algorithm for the DUCG to deal with feedback. Now, DUCG has been tested in 16 typical faults with 100% diagnosis accuracy.

  12. Brain networks for confidence weighting and hierarchical inference during probabilistic learning.

    PubMed

    Meyniel, Florent; Dehaene, Stanislas

    2017-05-09

    Learning is difficult when the world fluctuates randomly and ceaselessly. Classical learning algorithms, such as the delta rule with constant learning rate, are not optimal. Mathematically, the optimal learning rule requires weighting prior knowledge and incoming evidence according to their respective reliabilities. This "confidence weighting" implies the maintenance of an accurate estimate of the reliability of what has been learned. Here, using fMRI and an ideal-observer analysis, we demonstrate that the brain's learning algorithm relies on confidence weighting. While in the fMRI scanner, human adults attempted to learn the transition probabilities underlying an auditory or visual sequence, and reported their confidence in those estimates. They knew that these transition probabilities could change simultaneously at unpredicted moments, and therefore that the learning problem was inherently hierarchical. Subjective confidence reports tightly followed the predictions derived from the ideal observer. In particular, subjects managed to attach distinct levels of confidence to each learned transition probability, as required by Bayes-optimal inference. Distinct brain areas tracked the likelihood of new observations given current predictions, and the confidence in those predictions. Both signals were combined in the right inferior frontal gyrus, where they operated in agreement with the confidence-weighting model. This brain region also presented signatures of a hierarchical process that disentangles distinct sources of uncertainty. Together, our results provide evidence that the sense of confidence is an essential ingredient of probabilistic learning in the human brain, and that the right inferior frontal gyrus hosts a confidence-based statistical learning algorithm for auditory and visual sequences.

  13. Brain networks for confidence weighting and hierarchical inference during probabilistic learning

    PubMed Central

    Meyniel, Florent; Dehaene, Stanislas

    2017-01-01

    Learning is difficult when the world fluctuates randomly and ceaselessly. Classical learning algorithms, such as the delta rule with constant learning rate, are not optimal. Mathematically, the optimal learning rule requires weighting prior knowledge and incoming evidence according to their respective reliabilities. This “confidence weighting” implies the maintenance of an accurate estimate of the reliability of what has been learned. Here, using fMRI and an ideal-observer analysis, we demonstrate that the brain’s learning algorithm relies on confidence weighting. While in the fMRI scanner, human adults attempted to learn the transition probabilities underlying an auditory or visual sequence, and reported their confidence in those estimates. They knew that these transition probabilities could change simultaneously at unpredicted moments, and therefore that the learning problem was inherently hierarchical. Subjective confidence reports tightly followed the predictions derived from the ideal observer. In particular, subjects managed to attach distinct levels of confidence to each learned transition probability, as required by Bayes-optimal inference. Distinct brain areas tracked the likelihood of new observations given current predictions, and the confidence in those predictions. Both signals were combined in the right inferior frontal gyrus, where they operated in agreement with the confidence-weighting model. This brain region also presented signatures of a hierarchical process that disentangles distinct sources of uncertainty. Together, our results provide evidence that the sense of confidence is an essential ingredient of probabilistic learning in the human brain, and that the right inferior frontal gyrus hosts a confidence-based statistical learning algorithm for auditory and visual sequences. PMID:28439014

  14. IMNN: Information Maximizing Neural Networks

    NASA Astrophysics Data System (ADS)

    Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.

    2018-04-01

    This software trains artificial neural networks to find non-linear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). As compressing large data sets vastly simplifies both frequentist and Bayesian inference, important information may be inadvertently missed. Likelihood-free inference based on automatically derived IMNN summaries produces summaries that are good approximations to sufficient statistics. IMNNs are robustly capable of automatically finding optimal, non-linear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima.

  15. MultiNest: Efficient and Robust Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Feroz, F.; Hobson, M. P.; Bridges, M.

    2011-09-01

    We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla LambdaCDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software is fully parallelized using MPI and includes an interface to CosmoMC. It will also be released as part of the SuperBayeS package, for the analysis of supersymmetric theories of particle physics, at this http URL.

  16. Bayesian inference of interaction properties of noisy dynamical systems with time-varying coupling: capabilities and limitations

    NASA Astrophysics Data System (ADS)

    Wilting, Jens; Lehnertz, Klaus

    2015-08-01

    We investigate a recently published analysis framework based on Bayesian inference for the time-resolved characterization of interaction properties of noisy, coupled dynamical systems. It promises wide applicability and a better time resolution than well-established methods. At the example of representative model systems, we show that the analysis framework has the same weaknesses as previous methods, particularly when investigating interacting, structurally different non-linear oscillators. We also inspect the tracking of time-varying interaction properties and propose a further modification of the algorithm, which improves the reliability of obtained results. We exemplarily investigate the suitability of this algorithm to infer strength and direction of interactions between various regions of the human brain during an epileptic seizure. Within the limitations of the applicability of this analysis tool, we show that the modified algorithm indeed allows a better time resolution through Bayesian inference when compared to previous methods based on least square fits.

  17. Recursive algorithms for phylogenetic tree counting.

    PubMed

    Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

    2013-10-28

    In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.

  18. CI-KNOW: Cyberinfrastructure Knowledge Networks on the Web. A Social Network Enabled Recommender System for Locating Resources in Cyberinfrastructures

    NASA Astrophysics Data System (ADS)

    Green, H. D.; Contractor, N. S.; Yao, Y.

    2006-12-01

    A knowledge network is a multi-dimensional network created from the interactions and interconnections among the scientists, documents, data, analytic tools, and interactive collaboration spaces (like forums and wikis) associated with a collaborative environment. CI-KNOW is a suite of software tools that leverages automated data collection, social network theories, analysis techniques and algorithms to infer an individual's interests and expertise based on their interactions and activities within a knowledge network. The CI-KNOW recommender system mines the knowledge network associated with a scientific community's use of cyberinfrastructure tools and uses relational metadata to record connections among entities in the knowledge network. Recent developments in social network theories and methods provide the backbone for a modular system that creates recommendations from relational metadata. A network navigation portlet allows users to locate colleagues, documents, data or analytic tools in the knowledge network and to explore their networks through a visual, step-wise process. An internal auditing portlet offers administrators diagnostics to assess the growth and health of the entire knowledge network. The first instantiation of the prototype CI-KNOW system is part of the Environmental Cyberinfrastructure Demonstration project at the National Center for Supercomputing Applications, which supports the activities of hydrologic and environmental science communities (CLEANER and CUAHSI) under the umbrella of the WATERS network environmental observatory planning activities (http://cleaner.ncsa.uiuc.edu). This poster summarizes the key aspects of the CI-KNOW system, highlighting the key inputs, calculation mechanisms, and output modalities.

  19. SoyNet: a database of co-functional networks for soybean Glycine max.

    PubMed

    Kim, Eiru; Hwang, Sohyun; Lee, Insuk

    2017-01-04

    Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems.

    PubMed

    Yin, Weiwei; Garimalla, Swetha; Moreno, Alberto; Galinski, Mary R; Styczynski, Mark P

    2015-08-28

    There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist. Here, we test the idea that by leveraging the accuracy and efficiency of classifiers, we can construct high-quality networks that capture important interactions between variables in datasets with few samples. We start from a previously-developed tree-like Bayesian classifier and generalize its network learning approach to allow for arbitrary depth and complexity of tree-like networks. Using four diverse sample networks, we demonstrate that this approach performs consistently better at low sample sizes than the Sparse Candidate Algorithm, a representative approach for comparison because it is known to generate Bayesian networks with high positive predictive value. We develop and demonstrate a resampling-based approach to enable the identification of a viable root for the learned tree-like network, important for cases where the root of a network is not known a priori. We also develop and demonstrate an integrated resampling-based approach to the reduction of variable space for the learning of the network. Finally, we demonstrate the utility of this approach via the analysis of a transcriptional dataset of a malaria challenge in a non-human primate model system, Macaca mulatta, suggesting the potential to capture indicators of the earliest stages of cellular differentiation during leukopoiesis. We demonstrate that by starting from effective and efficient approaches for creating classifiers, we can identify interesting tree-like network structures with significant ability to capture the relationships in the training data. This approach represents a promising strategy for inferring networks with high positive predictive value under the constraint of small numbers of samples, meeting a need that will only continue to grow as more high-throughput studies are applied to complex model systems.

  1. A method for inferring regional origins of neurodegeneration.

    PubMed

    Torok, Justin; Maia, Pedro D; Powell, Fon; Pandya, Sneha; Raj, Ashish

    2018-02-02

    Alzheimer's disease, the most common form of dementia, is characterized by the emergence and spread of senile plaques and neurofibrillary tangles, causing widespread neurodegeneration. Though the progression of Alzheimer's disease is considered to be stereotyped, the significant variability within clinical populations obscures this interpretation on the individual level. Of particular clinical importance is understanding where exactly pathology, e.g. tau, emerges in each patient and how the incipient atrophy pattern relates to future spread of disease. Here we demonstrate a newly developed graph theoretical method of inferring prior disease states in patients with Alzheimer's disease and mild cognitive impairment using an established network diffusion model and an L1-penalized optimization algorithm. Although the 'seeds' of origin using our inference method successfully reproduce known trends in Alzheimer's disease staging on a population level, we observed that the high degree of heterogeneity between patients at baseline is also reflected in their seeds. Additionally, the individualized seeds are significantly more predictive of future atrophy than a single seed placed at the hippocampus. Our findings illustrate that understanding where disease originates in individuals is critical to determining how it progresses and that our method allows us to infer early stages of disease from atrophy patterns observed at diagnosis. © The Author(s) (2018). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Differentiable cortical networks for inferences concerning people's intentions versus physical causality.

    PubMed

    Mason, Robert A; Just, Marcel Adam

    2011-02-01

    Cortical activity associated with generating an inference was measured using fMRI. Participants read three-sentence passages that differed in whether or not an inference needed to be drawn to understand them. The inference was based on either a protagonist's intention or a physical consequence of a character's action. Activation was expected in Theory of Mind brain regions for the passages based on protagonists' intentions but not for the physical consequence passages. The activation measured in the right temporo-parietal junction was greater in the intentional passages than in the consequence passages, consistent with predictions from a Theory of Mind perspective. In contrast, there was increased occipital activation in the physical inference passages. For both types of passage, the cortical activity related to the reading of the critical inference sentence demonstrated a recruitment of a common inference cortical network. This general inference-related activation appeared bilaterally in the language processing areas (the inferior frontal gyrus, the temporal gyrus, and the angular gyrus), as well as in the medial to superior frontal gyrus, which has been found to be active in Theory of Mind tasks. These findings are consistent with the hypothesis that component areas of the discourse processing network are recruited as needed based on the nature of the inference. A Protagonist monitoring and synthesis network is proposed as a more accurate account for Theory of Mind activation during narrative comprehension. Copyright © 2010 Wiley-Liss, Inc.

  3. Modeling spatial decisions with graph theory: logging roads and forest fragmentation in the Brazilian Amazon.

    PubMed

    Walker, Robert; Arima, Eugenio; Messina, Joe; Soares-Filho, Britaldo; Perz, Stephen; Vergara, Dante; Sales, Marcio; Pereira, Ritaumaria; Castro, Williams

    2013-01-01

    This article addresses the spatial decision-making of loggers and implications for forest fragmentation in the Amazon basin. It provides a behavioral explanation for fragmentation by modeling how loggers build road networks, typically abandoned upon removal of hardwoods. Logging road networks provide access to land, and the settlers who take advantage of them clear fields and pastures that accentuate their spatial signatures. In shaping agricultural activities, these networks organize emergent patterns of forest fragmentation, even though the loggers move elsewhere. The goal of the article is to explicate how loggers shape their road networks, in order to theoretically explain an important type of forest fragmentation found in the Amazon basin, particularly in Brazil. This is accomplished by adapting graph theory to represent the spatial decision-making of loggers, and by implementing computational algorithms that build graphs interpretable as logging road networks. The economic behavior of loggers is conceptualized as a profit maximization problem, and translated into spatial decision-making by establishing a formal correspondence between mathematical graphs and road networks. New computational approaches, adapted from operations research, are used to construct graphs and simulate spatial decision-making as a function of discount rates, land tenure, and topographic constraints. The algorithms employed bracket a range of behavioral settings appropriate for areas of terras de volutas, public lands that have not been set aside for environmental protection, indigenous peoples, or colonization. The simulation target sites are located in or near so-called Terra do Meio, once a major logging frontier in the lower Amazon Basin. Simulation networks are compared to empirical ones identified by remote sensing and then used to draw inferences about factors influencing the spatial behavior of loggers. Results overall suggest that Amazonia's logging road networks induce more fragmentation than necessary to access fixed quantities of wood. The paper concludes by considering implications of the approach and findings for Brazil's move to a system of concession logging.

  4. An Intuitive Dashboard for Bayesian Network Inference

    NASA Astrophysics Data System (ADS)

    Reddy, Vikas; Charisse Farr, Anna; Wu, Paul; Mengersen, Kerrie; Yarlagadda, Prasad K. D. V.

    2014-03-01

    Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++.

  5. Short-Term Solar Forecasting Performance of Popular Machine Learning Algorithms: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Florita, Anthony R; Elgindy, Tarek; Hodge, Brian S

    A framework for assessing the performance of short-term solar forecasting is presented in conjunction with a range of numerical results using global horizontal irradiation (GHI) from the open-source Surface Radiation Budget (SURFRAD) data network. A suite of popular machine learning algorithms is compared according to a set of statistically distinct metrics and benchmarked against the persistence-of-cloudiness forecast and a cloud motion forecast. Results show significant improvement compared to the benchmarks with trade-offs among the machine learning algorithms depending on the desired error metric. Training inputs include time series observations of GHI for a history of years, historical weather and atmosphericmore » measurements, and corresponding date and time stamps such that training sensitivities might be inferred. Prediction outputs are GHI forecasts for 1, 2, 3, and 4 hours ahead of the issue time, and they are made for every month of the year for 7 locations. Photovoltaic power and energy outputs can then be made using the solar forecasts to better understand power system impacts.« less

  6. Visual Inference Programming

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin; Timucin, Dogan; Rabbette, Maura; Curry, Charles; Allan, Mark; Lvov, Nikolay; Clanton, Sam; Pilewskie, Peter

    2002-01-01

    The goal of visual inference programming is to develop a software framework data analysis and to provide machine learning algorithms for inter-active data exploration and visualization. The topics include: 1) Intelligent Data Understanding (IDU) framework; 2) Challenge problems; 3) What's new here; 4) Framework features; 5) Wiring diagram; 6) Generated script; 7) Results of script; 8) Initial algorithms; 9) Independent Component Analysis for instrument diagnosis; 10) Output sensory mapping virtual joystick; 11) Output sensory mapping typing; 12) Closed-loop feedback mu-rhythm control; 13) Closed-loop training; 14) Data sources; and 15) Algorithms. This paper is in viewgraph form.

  7. Inference, simulation, modeling, and analysis of complex networks, with special emphasis on complex networks in systems biology

    NASA Astrophysics Data System (ADS)

    Christensen, Claire Petra

    Across diverse fields ranging from physics to biology, sociology, and economics, the technological advances of the past decade have engendered an unprecedented explosion of data on highly complex systems with thousands, if not millions of interacting components. These systems exist at many scales of size and complexity, and it is becoming ever-more apparent that they are, in fact, universal, arising in every field of study. Moreover, they share fundamental properties---chief among these, that the individual interactions of their constituent parts may be well-understood, but the characteristic behaviour produced by the confluence of these interactions---by these complex networks---is unpredictable; in a nutshell, the whole is more than the sum of its parts. There is, perhaps, no better illustration of this concept than the discoveries being made regarding complex networks in the biological sciences. In particular, though the sequencing of the human genome in 2003 was a remarkable feat, scientists understand that the "cellular-level blueprints" for the human being are cellular-level parts lists, but they say nothing (explicitly) about cellular-level processes. The challenge of modern molecular biology is to understand these processes in terms of the networks of parts---in terms of the interactions among proteins, enzymes, genes, and metabolites---as it is these processes that ultimately differentiate animate from inanimate, giving rise to life! It is the goal of systems biology---an umbrella field encapsulating everything from molecular biology to epidemiology in social systems---to understand processes in terms of fundamental networks of core biological parts, be they proteins or people. By virtue of the fact that there are literally countless complex systems, not to mention tools and techniques used to infer, simulate, analyze, and model these systems, it is impossible to give a truly comprehensive account of the history and study of complex systems. The author's own publications have contributed network inference, simulation, modeling, and analysis methods to the much larger body of work in systems biology, and indeed, in network science. The aim of this thesis is therefore twofold: to present this original work in the historical context of network science, but also to provide sufficient review and reference regarding complex systems (with an emphasis on complex networks in systems biology) and tools and techniques for their inference, simulation, analysis, and modeling, such that the reader will be comfortable in seeking out further information on the subject. The review-like Chapters 1, 2, and 4 are intended to convey the co-evolution of network science and the slow but noticeable breakdown of boundaries between disciplines in academia as research and comparison of diverse systems has brought to light the shared properties of these systems. It is the author's hope that theses chapters impart some sense of the remarkable and rapid progress in complex systems research that has led to this unprecedented academic synergy. Chapters 3 and 5 detail the author's original work in the context of complex systems research. Chapter 3 presents the methods and results of a two-stage modeling process that generates candidate gene-regulatory networks of the bacterium B.subtilis from experimentally obtained, yet mathematically underdetermined microchip array data. These networks are then analyzed from a graph theoretical perspective, and their biological viability is critiqued by comparing the networks' graph theoretical properties to those of other biological systems. The results of topological perturbation analyses revealing commonalities in behavior at multiple levels of complexity are also presented, and are shown to be an invaluable means by which to ascertain the level of complexity to which the network inference process is robust to noise. Chapter 5 outlines a learning algorithm for the development of a realistic, evolving social network (a city) into which a disease is introduced. The results of simulations in populations spanning two orders of magnitude are compared to prevaccine era measles data for England and Wales and demonstrate that the simulations are able to capture the quantitative and qualitative features of epidemics in populations as small as 10,000 people. The work presented in Chapter 5 validates the utility of network simulation in concurrently probing contact network dynamics and disease dynamics.

  8. Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

    PubMed

    Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak

    2006-06-06

    To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.

  9. Mean field analysis of algorithms for scale-free networks in molecular biology

    PubMed Central

    2017-01-01

    The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabási-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabási-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely P(k) ∼ k−γ, where γ is a scaling exponent. We derive mean field expressions for γ, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabási-Albert and Solé networks). PMID:29272285

  10. Mean field analysis of algorithms for scale-free networks in molecular biology.

    PubMed

    Konini, S; Janse van Rensburg, E J

    2017-01-01

    The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabási-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabási-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely P(k) ∼ k-γ, where γ is a scaling exponent. We derive mean field expressions for γ, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabási-Albert and Solé networks).

  11. Analysis of Bioactive Amino Acids from Fish Hydrolysates with a New Bioinformatic Intelligent System Approach.

    PubMed

    Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu

    2017-09-07

    The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.

  12. A Neural Network Approach to Infer Optical Depth of Thick Ice Clouds at Night

    NASA Technical Reports Server (NTRS)

    Minnis, P.; Hong, G.; Sun-Mack, S.; Chen, Yan; Smith, W. L., Jr.

    2016-01-01

    One of the roadblocks to continuously monitoring cloud properties is the tendency of clouds to become optically black at cloud optical depths (COD) of 6 or less. This constraint dramatically reduces the quantitative information content at night. A recent study found that because of their diffuse nature, ice clouds remain optically gray, to some extent, up to COD of 100 at certain wavelengths. Taking advantage of this weak dependency and the availability of COD retrievals from CloudSat, an artificial neural network algorithm was developed to estimate COD values up to 70 from common satellite imager infrared channels. The method was trained using matched 2007 CloudSat and Aqua MODIS data and is tested using similar data from 2008. The results show a significant improvement over the use of default values at night with high correlation. This paper summarizes the results and suggests paths for future improvement.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pichara, Karim; Protopapas, Pavlos

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine howmore » classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.« less

  14. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  15. Generative models for network neuroscience: prospects and promise

    PubMed Central

    Betzel, Richard F.

    2017-01-01

    Network neuroscience is the emerging discipline concerned with investigating the complex patterns of interconnections found in neural systems, and identifying principles with which to understand them. Within this discipline, one particularly powerful approach is network generative modelling, in which wiring rules are algorithmically implemented to produce synthetic network architectures with the same properties as observed in empirical network data. Successful models can highlight the principles by which a network is organized and potentially uncover the mechanisms by which it grows and develops. Here, we review the prospects and promise of generative models for network neuroscience. We begin with a primer on network generative models, with a discussion of compressibility and predictability, and utility in intuiting mechanisms, followed by a short history on their use in network science, broadly. We then discuss generative models in practice and application, paying particular attention to the critical need for cross-validation. Next, we review generative models of biological neural networks, both at the cellular and large-scale level, and across a variety of species including Caenorhabditis elegans, Drosophila, mouse, rat, cat, macaque and human. We offer a careful treatment of a few relevant distinctions, including differences between generative models and null models, sufficiency and redundancy, inferring and claiming mechanism, and functional and structural connectivity. We close with a discussion of future directions, outlining exciting frontiers both in empirical data collection efforts as well as in method and theory development that, together, further the utility of the generative network modelling approach for network neuroscience. PMID:29187640

  16. Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services.

    PubMed

    Shi, Longxiang; Li, Shijian; Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin

    2017-01-01

    With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective.

  17. Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services

    PubMed Central

    Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin

    2017-01-01

    With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective. PMID:28299322

  18. An inference engine for embedded diagnostic systems

    NASA Technical Reports Server (NTRS)

    Fox, Barry R.; Brewster, Larry T.

    1987-01-01

    The implementation of an inference engine for embedded diagnostic systems is described. The system consists of two distinct parts. The first is an off-line compiler which accepts a propositional logical statement of the relationship between facts and conclusions and produces data structures required by the on-line inference engine. The second part consists of the inference engine and interface routines which accept assertions of fact and return the conclusions which necessarily follow. Given a set of assertions, it will generate exactly the conclusions which logically follow. At the same time, it will detect any inconsistencies which may propagate from an inconsistent set of assertions or a poorly formulated set of rules. The memory requirements are fixed and the worst case execution times are bounded at compile time. The data structures and inference algorithms are very simple and well understood. The data structures and algorithms are described in detail. The system has been implemented on Lisp, Pascal, and Modula-2.

  19. Portfolios in Stochastic Local Search: Efficiently Computing Most Probable Explanations in Bayesian Networks

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.; Roth, Dan; Wilkins, David C.

    2001-01-01

    Portfolio methods support the combination of different algorithms and heuristics, including stochastic local search (SLS) heuristics, and have been identified as a promising approach to solve computationally hard problems. While successful in experiments, theoretical foundations and analytical results for portfolio-based SLS heuristics are less developed. This article aims to improve the understanding of the role of portfolios of heuristics in SLS. We emphasize the problem of computing most probable explanations (MPEs) in Bayesian networks (BNs). Algorithmically, we discuss a portfolio-based SLS algorithm for MPE computation, Stochastic Greedy Search (SGS). SGS supports the integration of different initialization operators (or initialization heuristics) and different search operators (greedy and noisy heuristics), thereby enabling new analytical and experimental results. Analytically, we introduce a novel Markov chain model tailored to portfolio-based SLS algorithms including SGS, thereby enabling us to analytically form expected hitting time results that explain empirical run time results. For a specific BN, we show the benefit of using a homogenous initialization portfolio. To further illustrate the portfolio approach, we consider novel additive search heuristics for handling determinism in the form of zero entries in conditional probability tables in BNs. Our additive approach adds rather than multiplies probabilities when computing the utility of an explanation. We motivate the additive measure by studying the dramatic impact of zero entries in conditional probability tables on the number of zero-probability explanations, which again complicates the search process. We consider the relationship between MAXSAT and MPE, and show that additive utility (or gain) is a generalization, to the probabilistic setting, of MAXSAT utility (or gain) used in the celebrated GSAT and WalkSAT algorithms and their descendants. Utilizing our Markov chain framework, we show that expected hitting time is a rational function - i.e. a ratio of two polynomials - of the probability of applying an additive search operator. Experimentally, we report on synthetically generated BNs as well as BNs from applications, and compare SGSs performance to that of Hugin, which performs BN inference by compilation to and propagation in clique trees. On synthetic networks, SGS speeds up computation by approximately two orders of magnitude compared to Hugin. In application networks, our approach is highly competitive in Bayesian networks with a high degree of determinism. In addition to showing that stochastic local search can be competitive with clique tree clustering, our empirical results provide an improved understanding of the circumstances under which portfolio-based SLS outperforms clique tree clustering and vice versa.

  20. Explaining Inference on a Population of Independent Agents Using Bayesian Networks

    ERIC Educational Resources Information Center

    Sutovsky, Peter

    2013-01-01

    The main goal of this research is to design, implement, and evaluate a novel explanation method, the hierarchical explanation method (HEM), for explaining Bayesian network (BN) inference when the network is modeling a population of conditionally independent agents, each of which is modeled as a subnetwork. For example, consider disease-outbreak…

  1. Ontology Mapping Neural Network: An Approach to Learning and Inferring Correspondences among Ontologies

    ERIC Educational Resources Information Center

    Peng, Yefei

    2010-01-01

    An ontology mapping neural network (OMNN) is proposed in order to learn and infer correspondences among ontologies. It extends the Identical Elements Neural Network (IENN)'s ability to represent and map complex relationships. The learning dynamics of simultaneous (interlaced) training of similar tasks interact at the shared connections of the…

  2. Inference of cancer-specific gene regulatory networks using soft computing rules.

    PubMed

    Wang, Xiaosheng; Gotoh, Osamu

    2010-03-24

    Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer) using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.

  3. Unraveling gene regulatory networks from time-resolved gene expression data -- a measures comparison study

    PubMed Central

    2011-01-01

    Background Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications. Results Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study. Conclusions Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices. PMID:21771321

  4. CrosstalkNet: A Visualization Tool for Differential Co-expression Networks and Communities.

    PubMed

    Manem, Venkata; Adam, George Alexandru; Gruosso, Tina; Gigoux, Mathieu; Bertos, Nicholas; Park, Morag; Haibe-Kains, Benjamin

    2018-04-15

    Variations in physiological conditions can rewire molecular interactions between biological compartments, which can yield novel insights into gain or loss of interactions specific to perturbations of interest. Networks are a promising tool to elucidate intercellular interactions, yet exploration of these large-scale networks remains a challenge due to their high dimensionality. To retrieve and mine interactions, we developed CrosstalkNet, a user friendly, web-based network visualization tool that provides a statistical framework to infer condition-specific interactions coupled with a community detection algorithm for bipartite graphs to identify significantly dense subnetworks. As a case study, we used CrosstalkNet to mine a set of 54 and 22 gene-expression profiles from breast tumor and normal samples, respectively, with epithelial and stromal compartments extracted via laser microdissection. We show how CrosstalkNet can be used to explore large-scale co-expression networks and to obtain insights into the biological processes that govern cross-talk between different tumor compartments. Significance: This web application enables researchers to mine complex networks and to decipher novel biological processes in tumor epithelial-stroma cross-talk as well as in other studies of intercompartmental interactions. Cancer Res; 78(8); 2140-3. ©2018 AACR . ©2018 American Association for Cancer Research.

  5. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks.

    PubMed

    Werhli, Adriano V; Grzegorczyk, Marco; Husmeier, Dirk

    2006-10-15

    An important problem in systems biology is the inference of biochemical pathways and regulatory networks from postgenomic data. Various reverse engineering methods have been proposed in the literature, and it is important to understand their relative merits and shortcomings. In the present paper, we compare the accuracy of reconstructing gene regulatory networks with three different modelling and inference paradigms: (1) Relevance networks (RNs): pairwise association scores independent of the remaining network; (2) graphical Gaussian models (GGMs): undirected graphical models with constraint-based inference, and (3) Bayesian networks (BNs): directed graphical models with score-based inference. The evaluation is carried out on the Raf pathway, a cellular signalling network describing the interaction of 11 phosphorylated proteins and phospholipids in human immune system cells. We use both laboratory data from cytometry experiments as well as data simulated from the gold-standard network. We also compare passive observations with active interventions. On Gaussian observational data, BNs and GGMs were found to outperform RNs. The difference in performance was not significant for the non-linear simulated data and the cytoflow data, though. Also, we did not observe a significant difference between BNs and GGMs on observational data in general. However, for interventional data, BNs outperform GGMs and RNs, especially when taking the edge directions rather than just the skeletons of the graphs into account. This suggests that the higher computational costs of inference with BNs over GGMs and RNs are not justified when using only passive observations, but that active interventions in the form of gene knockouts and over-expressions are required to exploit the full potential of BNs. Data, software and supplementary material are available from http://www.bioss.sari.ac.uk/staff/adriano/research.html

  6. Analytic continuation of quantum Monte Carlo data by stochastic analytical inference.

    PubMed

    Fuchs, Sebastian; Pruschke, Thomas; Jarrell, Mark

    2010-05-01

    We present an algorithm for the analytic continuation of imaginary-time quantum Monte Carlo data which is strictly based on principles of Bayesian statistical inference. Within this framework we are able to obtain an explicit expression for the calculation of a weighted average over possible energy spectra, which can be evaluated by standard Monte Carlo simulations, yielding as by-product also the distribution function as function of the regularization parameter. Our algorithm thus avoids the usual ad hoc assumptions introduced in similar algorithms to fix the regularization parameter. We apply the algorithm to imaginary-time quantum Monte Carlo data and compare the resulting energy spectra with those from a standard maximum-entropy calculation.

  7. Inferring Plasmodium vivax Transmission Networks from Tempo-Spatial Surveillance Data

    PubMed Central

    Shi, Benyun; Liu, Jiming; Zhou, Xiao-Nong; Yang, Guo-Jing

    2014-01-01

    Background The transmission networks of Plasmodium vivax characterize how the parasite transmits from one location to another, which are informative and insightful for public health policy makers to accurately predict the patterns of its geographical spread. However, such networks are not apparent from surveillance data because P. vivax transmission can be affected by many factors, such as the biological characteristics of mosquitoes and the mobility of human beings. Here, we pay special attention to the problem of how to infer the underlying transmission networks of P. vivax based on available tempo-spatial patterns of reported cases. Methodology We first define a spatial transmission model, which involves representing both the heterogeneous transmission potential of P. vivax at individual locations and the mobility of infected populations among different locations. Based on the proposed transmission model, we further introduce a recurrent neural network model to infer the transmission networks from surveillance data. Specifically, in this model, we take into account multiple real-world factors, including the length of P. vivax incubation period, the impact of malaria control at different locations, and the total number of imported cases. Principal Findings We implement our proposed models by focusing on the P. vivax transmission among 62 towns in Yunnan province, People's Republic China, which have been experiencing high malaria transmission in the past years. By conducting scenario analysis with respect to different numbers of imported cases, we can (i) infer the underlying P. vivax transmission networks, (ii) estimate the number of imported cases for each individual town, and (iii) quantify the roles of individual towns in the geographical spread of P. vivax. Conclusion The demonstrated models have presented a general means for inferring the underlying transmission networks from surveillance data. The inferred networks will offer new insights into how to improve the predictability of P. vivax transmission. PMID:24516684

  8. Discovery of Intramolecular Signal Transduction Network Based on a New Protein Dynamics Model of Energy Dissipation

    PubMed Central

    Ma, Cheng-Wei; Xiu, Zhi-Long; Zeng, An-Ping

    2012-01-01

    A novel approach to reveal intramolecular signal transduction network is proposed in this work. To this end, a new algorithm of network construction is developed, which is based on a new protein dynamics model of energy dissipation. A key feature of this approach is that direction information is specified after inferring protein residue-residue interaction network involved in the process of signal transduction. This enables fundamental analysis of the regulation hierarchy and identification of regulation hubs of the signaling network. A well-studied allosteric enzyme, E. coli aspartokinase III, is used as a model system to demonstrate the new method. Comparison with experimental results shows that the new approach is able to predict all the sites that have been experimentally proved to desensitize allosteric regulation of the enzyme. In addition, the signal transduction network shows a clear preference for specific structural regions, secondary structural types and residue conservation. Occurrence of super-hubs in the network indicates that allosteric regulation tends to gather residues with high connection ability to collectively facilitate the signaling process. Furthermore, a new parameter of propagation coefficient is defined to determine the propagation capability of residues within a signal transduction network. In conclusion, the new approach is useful for fundamental understanding of the process of intramolecular signal transduction and thus has significant impact on rational design of novel allosteric proteins. PMID:22363664

  9. Gene network analysis: from heart development to cardiac therapy.

    PubMed

    Ferrazzi, Fulvia; Bellazzi, Riccardo; Engel, Felix B

    2015-03-01

    Networks offer a flexible framework to represent and analyse the complex interactions between components of cellular systems. In particular gene networks inferred from expression data can support the identification of novel hypotheses on regulatory processes. In this review we focus on the use of gene network analysis in the study of heart development. Understanding heart development will promote the elucidation of the aetiology of congenital heart disease and thus possibly improve diagnostics. Moreover, it will help to establish cardiac therapies. For example, understanding cardiac differentiation during development will help to guide stem cell differentiation required for cardiac tissue engineering or to enhance endogenous repair mechanisms. We introduce different methodological frameworks to infer networks from expression data such as Boolean and Bayesian networks. Then we present currently available temporal expression data in heart development and discuss the use of network-based approaches in published studies. Collectively, our literature-based analysis indicates that gene network analysis constitutes a promising opportunity to infer therapy-relevant regulatory processes in heart development. However, the use of network-based approaches has so far been limited by the small amount of samples in available datasets. Thus, we propose to acquire high-resolution temporal expression data to improve the mathematical descriptions of regulatory processes obtained with gene network inference methodologies. Especially probabilistic methods that accommodate the intrinsic variability of biological systems have the potential to contribute to a deeper understanding of heart development.

  10. On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference

    NASA Astrophysics Data System (ADS)

    Hu, Zixi; Yao, Zhewei; Li, Jinglai

    2017-03-01

    Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.

  11. Sensor fusion methods for reducing false alarms in heart rate monitoring.

    PubMed

    Borges, Gabriel; Brusamarello, Valner

    2016-12-01

    Automatic patient monitoring is an essential resource in hospitals for good health care management. While alarms caused by abnormal physiological conditions are important for the delivery of fast treatment, they can be also a source of unnecessary noise because of false alarms caused by electromagnetic interference or motion artifacts. One significant source of false alarms is related to heart rate, which is triggered when the heart rhythm of the patient is too fast or too slow. In this work, the fusion of different physiological sensors is explored in order to create a robust heart rate estimation. A set of algorithms using heart rate variability index, Bayesian inference, neural networks, fuzzy logic and majority voting is proposed to fuse the information from the electrocardiogram, arterial blood pressure and photoplethysmogram. Three kinds of information are extracted from each source, namely, heart rate variability, the heart rate difference between sensors and the spectral analysis of low and high noise of each sensor. This information is used as input to the algorithms. Twenty recordings selected from the MIMIC database were used to validate the system. The results showed that neural networks fusion had the best false alarm reduction of 92.5 %, while the Bayesian technique had a reduction of 84.3 %, fuzzy logic 80.6 %, majority voter 72.5 % and the heart rate variability index 67.5 %. Therefore, the proposed algorithms showed good performance and could be useful in bedside monitors.

  12. Column generation algorithms for virtual network embedding in flexi-grid optical networks.

    PubMed

    Lin, Rongping; Luo, Shan; Zhou, Jingwei; Wang, Sheng; Chen, Bin; Zhang, Xiaoning; Cai, Anliang; Zhong, Wen-De; Zukerman, Moshe

    2018-04-16

    Network virtualization provides means for efficient management of network resources by embedding multiple virtual networks (VNs) to share efficiently the same substrate network. Such virtual network embedding (VNE) gives rise to a challenging problem of how to optimize resource allocation to VNs and to guarantee their performance requirements. In this paper, we provide VNE algorithms for efficient management of flexi-grid optical networks. We provide an exact algorithm aiming to minimize the total embedding cost in terms of spectrum cost and computation cost for a single VN request. Then, to achieve scalability, we also develop a heuristic algorithm for the same problem. We apply these two algorithms for a dynamic traffic scenario where many VN requests arrive one-by-one. We first demonstrate by simulations for the case of a six-node network that the heuristic algorithm obtains very close blocking probabilities to exact algorithm (about 0.2% higher). Then, for a network of realistic size (namely, USnet) we demonstrate that the blocking probability of our new heuristic algorithm is about one magnitude lower than a simpler heuristic algorithm, which was a component of an earlier published algorithm.

  13. Efficient fuzzy Bayesian inference algorithms for incorporating expert knowledge in parameter estimation

    NASA Astrophysics Data System (ADS)

    Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad

    2016-05-01

    Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.

  14. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    PubMed

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  15. A Topological Criterion for Filtering Information in Complex Brain Networks

    PubMed Central

    Latora, Vito; Chavez, Mario

    2017-01-01

    In many biological systems, the network of interactions between the elements can only be inferred from experimental measurements. In neuroscience, non-invasive imaging tools are extensively used to derive either structural or functional brain networks in-vivo. As a result of the inference process, we obtain a matrix of values corresponding to a fully connected and weighted network. To turn this into a useful sparse network, thresholding is typically adopted to cancel a percentage of the weakest connections. The structural properties of the resulting network depend on how much of the inferred connectivity is eventually retained. However, how to objectively fix this threshold is still an open issue. We introduce a criterion, the efficiency cost optimization (ECO), to select a threshold based on the optimization of the trade-off between the efficiency of a network and its wiring cost. We prove analytically and we confirm through numerical simulations that the connection density maximizing this trade-off emphasizes the intrinsic properties of a given network, while preserving its sparsity. Moreover, this density threshold can be determined a-priori, since the number of connections to filter only depends on the network size according to a power-law. We validate this result on several brain networks, from micro- to macro-scales, obtained with different imaging modalities. Finally, we test the potential of ECO in discriminating brain states with respect to alternative filtering methods. ECO advances our ability to analyze and compare biological networks, inferred from experimental data, in a fast and principled way. PMID:28076353

  16. Dynamic modelling of microRNA regulation during mesenchymal stem cell differentiation.

    PubMed

    Weber, Michael; Sotoca, Ana M; Kupfer, Peter; Guthke, Reinhard; van Zoelen, Everardus J

    2013-11-12

    Network inference from gene expression data is a typical approach to reconstruct gene regulatory networks. During chondrogenic differentiation of human mesenchymal stem cells (hMSCs), a complex transcriptional network is active and regulates the temporal differentiation progress. As modulators of transcriptional regulation, microRNAs (miRNAs) play a critical role in stem cell differentiation. Integrated network inference aimes at determining interrelations between miRNAs and mRNAs on the basis of expression data as well as miRNA target predictions. We applied the NetGenerator tool in order to infer an integrated gene regulatory network. Time series experiments were performed to measure mRNA and miRNA abundances of TGF-beta1+BMP2 stimulated hMSCs. Network nodes were identified by analysing temporal expression changes, miRNA target gene predictions, time series correlation and literature knowledge. Network inference was performed using NetGenerator to reconstruct a dynamical regulatory model based on the measured data and prior knowledge. The resulting model is robust against noise and shows an optimal trade-off between fitting precision and inclusion of prior knowledge. It predicts the influence of miRNAs on the expression of chondrogenic marker genes and therefore proposes novel regulatory relations in differentiation control. By analysing the inferred network, we identified a previously unknown regulatory effect of miR-524-5p on the expression of the transcription factor SOX9 and the chondrogenic marker genes COL2A1, ACAN and COL10A1. Genome-wide exploration of miRNA-mRNA regulatory relationships is a reasonable approach to identify miRNAs which have so far not been associated with the investigated differentiation process. The NetGenerator tool is able to identify valid gene regulatory networks on the basis of miRNA and mRNA time series data.

  17. Anomaly Detection in Dynamic Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turcotte, Melissa

    2014-10-14

    Anomaly detection in dynamic communication networks has many important security applications. These networks can be extremely large and so detecting any changes in their structure can be computationally challenging; hence, computationally fast, parallelisable methods for monitoring the network are paramount. For this reason the methods presented here use independent node and edge based models to detect locally anomalous substructures within communication networks. As a first stage, the aim is to detect changes in the data streams arising from node or edge communications. Throughout the thesis simple, conjugate Bayesian models for counting processes are used to model these data streams. Amore » second stage of analysis can then be performed on a much reduced subset of the network comprising nodes and edges which have been identified as potentially anomalous in the first stage. The first method assumes communications in a network arise from an inhomogeneous Poisson process with piecewise constant intensity. Anomaly detection is then treated as a changepoint problem on the intensities. The changepoint model is extended to incorporate seasonal behavior inherent in communication networks. This seasonal behavior is also viewed as a changepoint problem acting on a piecewise constant Poisson process. In a static time frame, inference is made on this extended model via a Gibbs sampling strategy. In a sequential time frame, where the data arrive as a stream, a novel, fast Sequential Monte Carlo (SMC) algorithm is introduced to sample from the sequence of posterior distributions of the change points over time. A second method is considered for monitoring communications in a large scale computer network. The usage patterns in these types of networks are very bursty in nature and don’t fit a Poisson process model. For tractable inference, discrete time models are considered, where the data are aggregated into discrete time periods and probability models are fitted to the communication counts. In a sequential analysis, anomalous behavior is then identified from outlying behavior with respect to the fitted predictive probability models. Seasonality is again incorporated into the model and is treated as a changepoint model on the transition probabilities of a discrete time Markov process. Second stage analytics are then developed which combine anomalous edges to identify anomalous substructures in the network.« less

  18. A Sparse Reconstruction Approach for Identifying Gene Regulatory Networks Using Steady-State Experiment Data

    PubMed Central

    Zhang, Wanhong; Zhou, Tong

    2015-01-01

    Motivation Identifying gene regulatory networks (GRNs) which consist of a large number of interacting units has become a problem of paramount importance in systems biology. Situations exist extensively in which causal interacting relationships among these units are required to be reconstructed from measured expression data and other a priori information. Though numerous classical methods have been developed to unravel the interactions of GRNs, these methods either have higher computing complexities or have lower estimation accuracies. Note that great similarities exist between identification of genes that directly regulate a specific gene and a sparse vector reconstruction, which often relates to the determination of the number, location and magnitude of nonzero entries of an unknown vector by solving an underdetermined system of linear equations y = Φx. Based on these similarities, we propose a novel framework of sparse reconstruction to identify the structure of a GRN, so as to increase accuracy of causal regulation estimations, as well as to reduce their computational complexity. Results In this paper, a sparse reconstruction framework is proposed on basis of steady-state experiment data to identify GRN structure. Different from traditional methods, this approach is adopted which is well suitable for a large-scale underdetermined problem in inferring a sparse vector. We investigate how to combine the noisy steady-state experiment data and a sparse reconstruction algorithm to identify causal relationships. Efficiency of this method is tested by an artificial linear network, a mitogen-activated protein kinase (MAPK) pathway network and the in silico networks of the DREAM challenges. The performance of the suggested approach is compared with two state-of-the-art algorithms, the widely adopted total least-squares (TLS) method and those available results on the DREAM project. Actual results show that, with a lower computational cost, the proposed method can significantly enhance estimation accuracy and greatly reduce false positive and negative errors. Furthermore, numerical calculations demonstrate that the proposed algorithm may have faster convergence speed and smaller fluctuation than other methods when either estimate error or estimate bias is considered. PMID:26207991

  19. Image feature based GPS trace filtering for road network generation and road segmentation

    DOE PAGES

    Yuan, Jiangye; Cheriyadat, Anil M.

    2015-10-19

    We propose a new method to infer road networks from GPS trace data and accurately segment road regions in high-resolution aerial images. Unlike previous efforts that rely on GPS traces alone, we exploit image features to infer road networks from noisy trace data. The inferred road network is used to guide road segmentation. We show that the number of image segments spanned by the traces and the trace orientation validated with image features are important attributes for identifying GPS traces on road regions. Based on filtered traces , we construct road networks and integrate them with image features to segmentmore » road regions. Lastly, our experiments show that the proposed method produces more accurate road networks than the leading method that uses GPS traces alone, and also achieves high accuracy in segmenting road regions even with very noisy GPS data.« less

  20. Image feature based GPS trace filtering for road network generation and road segmentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yuan, Jiangye; Cheriyadat, Anil M.

    We propose a new method to infer road networks from GPS trace data and accurately segment road regions in high-resolution aerial images. Unlike previous efforts that rely on GPS traces alone, we exploit image features to infer road networks from noisy trace data. The inferred road network is used to guide road segmentation. We show that the number of image segments spanned by the traces and the trace orientation validated with image features are important attributes for identifying GPS traces on road regions. Based on filtered traces , we construct road networks and integrate them with image features to segmentmore » road regions. Lastly, our experiments show that the proposed method produces more accurate road networks than the leading method that uses GPS traces alone, and also achieves high accuracy in segmenting road regions even with very noisy GPS data.« less

  1. Biological network motif detection and evaluation

    PubMed Central

    2011-01-01

    Background Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important. Results We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, EDGEGO-BNM, EDGEBETWEENNESS-BNM, NMF-BNM, NMFGO-BNM and VOLTAGE-BNM, for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that EDGEGO-BNM and EDGEBETWEENNESS-BNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well. Conclusion We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks. PMID:22784624

  2. The NIFTy way of Bayesian signal inference

    NASA Astrophysics Data System (ADS)

    Selig, Marco

    2014-12-01

    We introduce NIFTy, "Numerical Information Field Theory", a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTy can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTy as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.

  3. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    PubMed

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  4. Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies.

    PubMed

    Min, Xiaoping; Zhang, Mouzhao; Yuan, Sisi; Ge, Shengxiang; Liu, Xiangrong; Zeng, Xiangxiang; Xia, Ningshao

    2017-12-26

    In recent years, to infer phylogenies, which are NP-hard problems, more and more research has focused on using metaheuristics. Maximum Parsimony and Maximum Likelihood are two effective ways to conduct inference. Based on these methods, which can also be considered as the optimal criteria for phylogenies, various kinds of multi-objective metaheuristics have been used to reconstruct phylogenies. However, combining these two time-consuming methods results in those multi-objective metaheuristics being slower than a single objective. Therefore, we propose a novel, multi-objective optimization algorithm, MOEA-RC, to accelerate the processes of rebuilding phylogenies using structural information of elites in current populations. We compare MOEA-RC with two representative multi-objective algorithms, MOEA/D and NAGA-II, and a non-consensus version of MOEA-RC on three real-world datasets. The result is, within a given number of iterations, MOEA-RC achieves better solutions than the other algorithms.

  5. Community detection, link prediction, and layer interdependence in multilayer networks.

    PubMed

    De Bacco, Caterina; Power, Eleanor A; Larremore, Daniel B; Moore, Cristopher

    2017-04-01

    Complex systems are often characterized by distinct types of interactions between the same entities. These can be described as a multilayer network where each layer represents one type of interaction. These layers may be interdependent in complicated ways, revealing different kinds of structure in the network. In this work we present a generative model, and an efficient expectation-maximization algorithm, which allows us to perform inference tasks such as community detection and link prediction in this setting. Our model assumes overlapping communities that are common between the layers, while allowing these communities to affect each layer in a different way, including arbitrary mixtures of assortative, disassortative, or directed structure. It also gives us a mathematically principled way to define the interdependence between layers, by measuring how much information about one layer helps us predict links in another layer. In particular, this allows us to bundle layers together to compress redundant information and identify small groups of layers which suffice to predict the remaining layers accurately. We illustrate these findings by analyzing synthetic data and two real multilayer networks, one representing social support relationships among villagers in South India and the other representing shared genetic substring material between genes of the malaria parasite.

  6. Community detection, link prediction, and layer interdependence in multilayer networks

    NASA Astrophysics Data System (ADS)

    De Bacco, Caterina; Power, Eleanor A.; Larremore, Daniel B.; Moore, Cristopher

    2017-04-01

    Complex systems are often characterized by distinct types of interactions between the same entities. These can be described as a multilayer network where each layer represents one type of interaction. These layers may be interdependent in complicated ways, revealing different kinds of structure in the network. In this work we present a generative model, and an efficient expectation-maximization algorithm, which allows us to perform inference tasks such as community detection and link prediction in this setting. Our model assumes overlapping communities that are common between the layers, while allowing these communities to affect each layer in a different way, including arbitrary mixtures of assortative, disassortative, or directed structure. It also gives us a mathematically principled way to define the interdependence between layers, by measuring how much information about one layer helps us predict links in another layer. In particular, this allows us to bundle layers together to compress redundant information and identify small groups of layers which suffice to predict the remaining layers accurately. We illustrate these findings by analyzing synthetic data and two real multilayer networks, one representing social support relationships among villagers in South India and the other representing shared genetic substring material between genes of the malaria parasite.

  7. Detection of inter-turn short-circuit at start-up of induction machine based on torque analysis

    NASA Astrophysics Data System (ADS)

    Pietrowski, Wojciech; Górny, Konrad

    2017-12-01

    Recently, interest in new diagnostics methods in a field of induction machines was observed. Research presented in the paper shows the diagnostics of induction machine based on torque pulsation, under inter-turn short-circuit, during start-up of a machine. In the paper three numerical techniques were used: finite element analysis, signal analysis and artificial neural networks (ANN). The elaborated numerical model of faulty machine consists of field, circuit and motion equations. Voltage excited supply allowed to determine the torque waveform during start-up. The inter-turn short-circuit was treated as a galvanic connection between two points of the stator winding. The waveforms were calculated for different amounts of shorted-turns from 0 to 55. Due to the non-stationary waveforms a wavelet packet decomposition was used to perform an analysis of the torque. The obtained results of analysis were used as input vector for ANN. The response of the neural network was the number of shorted-turns in the stator winding. Special attention was paid to compare response of general regression neural network (GRNN) and multi-layer perceptron neural network (MLP). Based on the results of the research, the efficiency of the developed algorithm can be inferred.

  8. Integrating Information in Biological Ontologies and Molecular Networks to Infer Novel Terms.

    PubMed

    Li, Le; Yip, Kevin Y

    2016-12-15

    Currently most terms and term-term relationships in Gene Ontology (GO) are defined manually, which creates cost, consistency and completeness issues. Recent studies have demonstrated the feasibility of inferring GO automatically from biological networks, which represents an important complementary approach to GO construction. These methods (NeXO and CliXO) are unsupervised, which means 1) they cannot use the information contained in existing GO, 2) the way they integrate biological networks may not optimize the accuracy, and 3) they are not customized to infer the three different sub-ontologies of GO. Here we present a semi-supervised method called Unicorn that extends these previous methods to tackle the three problems. Unicorn uses a sub-tree of an existing GO sub-ontology as training part to learn parameters in integrating multiple networks. Cross-validation results show that Unicorn reliably inferred the left-out parts of each specific GO sub-ontology. In addition, by training Unicorn with an old version of GO together with biological networks, it successfully re-discovered some terms and term-term relationships present only in a new version of GO. Unicorn also successfully inferred some novel terms that were not contained in GO but have biological meanings well-supported by the literature. Source code of Unicorn is available at http://yiplab.cse.cuhk.edu.hk/unicorn/.

  9. Adaptive neuro fuzzy inference system-based power estimation method for CMOS VLSI circuits

    NASA Astrophysics Data System (ADS)

    Vellingiri, Govindaraj; Jayabalan, Ramesh

    2018-03-01

    Recent advancements in very large scale integration (VLSI) technologies have made it feasible to integrate millions of transistors on a single chip. This greatly increases the circuit complexity and hence there is a growing need for less-tedious and low-cost power estimation techniques. The proposed work employs Back-Propagation Neural Network (BPNN) and Adaptive Neuro Fuzzy Inference System (ANFIS), which are capable of estimating the power precisely for the complementary metal oxide semiconductor (CMOS) VLSI circuits, without requiring any knowledge on circuit structure and interconnections. The ANFIS to power estimation application is relatively new. Power estimation using ANFIS is carried out by creating initial FIS modes using hybrid optimisation and back-propagation (BP) techniques employing constant and linear methods. It is inferred that ANFIS with the hybrid optimisation technique employing the linear method produces better results in terms of testing error that varies from 0% to 0.86% when compared to BPNN as it takes the initial fuzzy model and tunes it by means of a hybrid technique combining gradient descent BP and mean least-squares optimisation algorithms. ANFIS is the best suited for power estimation application with a low RMSE of 0.0002075 and a high coefficient of determination (R) of 0.99961.

  10. Bayesian inference based on stationary Fokker-Planck sampling.

    PubMed

    Berrones, Arturo

    2010-06-01

    A novel formalism for bayesian learning in the context of complex inference models is proposed. The method is based on the use of the stationary Fokker-Planck (SFP) approach to sample from the posterior density. Stationary Fokker-Planck sampling generalizes the Gibbs sampler algorithm for arbitrary and unknown conditional densities. By the SFP procedure, approximate analytical expressions for the conditionals and marginals of the posterior can be constructed. At each stage of SFP, the approximate conditionals are used to define a Gibbs sampling process, which is convergent to the full joint posterior. By the analytical marginals efficient learning methods in the context of artificial neural networks are outlined. Offline and incremental bayesian inference and maximum likelihood estimation from the posterior are performed in classification and regression examples. A comparison of SFP with other Monte Carlo strategies in the general problem of sampling from arbitrary densities is also presented. It is shown that SFP is able to jump large low-probability regions without the need of a careful tuning of any step-size parameter. In fact, the SFP method requires only a small set of meaningful parameters that can be selected following clear, problem-independent guidelines. The computation cost of SFP, measured in terms of loss function evaluations, grows linearly with the given model's dimension.

  11. Social representations and contextual adjustments as two distinct components of the Theory of Mind brain network: Evidence from the REMICS task.

    PubMed

    Lavoie, Marie-Audrey; Vistoli, Damien; Sutliff, Stephanie; Jackson, Philip L; Achim, Amélie M

    2016-08-01

    Theory of mind (ToM) refers to the ability to infer the mental states of others. Behavioral measures of ToM usually present information about both a character and the context in which this character is placed, and these different pieces of information can be used to infer the character's mental states. A set of brain regions designated as the ToM brain network is recognized to support (ToM) inferences. Different brain regions within that network could however support different ToM processes. This functional magnetic resonance imaging (fMRI) study aimed to distinguish the brain regions supporting two aspects inherent to many ToM tasks, i.e., the ability to infer or represent mental states and the ability to use the context to adjust these inferences. Nineteen healthy subjects were scanned during the REMICS task, a novel task designed to orthogonally manipulate mental state inferences (as opposed to physical inferences) and contextual adjustments of inferences (as opposed to inferences that do not require contextual adjustments). We observed that mental state inferences and contextual adjustments, which are important aspects of most behavioral ToM tasks, rely on distinct brain regions or subregions within the classical brain network activated in previous ToM research. Notably, an interesting dissociation emerged within the medial prefrontal cortex (mPFC) and temporo-parietal junctions (TPJ) such that the inferior part of these brain regions responded to mental state inferences while the superior part of these brain regions responded to the requirement for contextual adjustments. This study provides evidence that the overall set of brain regions activated during ToM tasks supports different processes, and highlights that cognitive processes related to contextual adjustments have an important role in ToM and should be further studied. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Automatic Road Gap Detection Using Fuzzy Inference System

    NASA Astrophysics Data System (ADS)

    Hashemi, S.; Valadan Zoej, M. J.; Mokhtarzadeh, M.

    2011-09-01

    Automatic feature extraction from aerial and satellite images is a high-level data processing which is still one of the most important research topics of the field. In this area, most of the researches are focused on the early step of road detection, where road tracking methods, morphological analysis, dynamic programming and snakes, multi-scale and multi-resolution methods, stereoscopic and multi-temporal analysis, hyper spectral experiments, are some of the mature methods in this field. Although most researches are focused on detection algorithms, none of them can extract road network perfectly. On the other hand, post processing algorithms accentuated on the refining of road detection results, are not developed as well. In this article, the main is to design an intelligent method to detect and compensate road gaps remained on the early result of road detection algorithms. The proposed algorithm consists of five main steps as follow: 1) Short gap coverage: In this step, a multi-scale morphological is designed that covers short gaps in a hierarchical scheme. 2) Long gap detection: In this step, the long gaps, could not be covered in the previous stage, are detected using a fuzzy inference system. for this reason, a knowledge base consisting of some expert rules are designed which are fired on some gap candidates of the road detection results. 3) Long gap coverage: In this stage, detected long gaps are compensated by two strategies of linear and polynomials for this reason, shorter gaps are filled by line fitting while longer ones are compensated by polynomials.4) Accuracy assessment: In order to evaluate the obtained results, some accuracy assessment criteria are proposed. These criteria are obtained by comparing the obtained results with truly compensated ones produced by a human expert. The complete evaluation of the obtained results whit their technical discussions are the materials of the full paper.

  13. The discovery of structural form

    PubMed Central

    Kemp, Charles; Tenenbaum, Joshua B.

    2008-01-01

    Algorithms for finding structure in data have become increasingly important both as tools for scientific data analysis and as models of human learning, yet they suffer from a critical limitation. Scientists discover qualitatively new forms of structure in observed data: For instance, Linnaeus recognized the hierarchical organization of biological species, and Mendeleev recognized the periodic structure of the chemical elements. Analogous insights play a pivotal role in cognitive development: Children discover that object category labels can be organized into hierarchies, friendship networks are organized into cliques, and comparative relations (e.g., “bigger than” or “better than”) respect a transitive order. Standard algorithms, however, can only learn structures of a single form that must be specified in advance: For instance, algorithms for hierarchical clustering create tree structures, whereas algorithms for dimensionality-reduction create low-dimensional spaces. Here, we present a computational model that learns structures of many different forms and that discovers which form is best for a given dataset. The model makes probabilistic inferences over a space of graph grammars representing trees, linear orders, multidimensional spaces, rings, dominance hierarchies, cliques, and other forms and successfully discovers the underlying structure of a variety of physical, biological, and social domains. Our approach brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development. PMID:18669663

  14. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  15. Analysis and Design of Complex Network Environments

    DTIC Science & Technology

    2014-02-01

    entanglements among un- measured variables. This “potential entanglement ” type of network complexity is previously unaddressed in the literature, yet it...Appreciating the power of structural representations that allow for potential entanglement among unmeasured variables to simplify network inference problems...rely on the idea of subsystems and allows for potential entanglement among unmeasured states. As a result, inferring a system’s signal structure

  16. Inferring topologies via driving-based generalized synchronization of two-layer networks

    NASA Astrophysics Data System (ADS)

    Wang, Yingfei; Wu, Xiaoqun; Feng, Hui; Lu, Jun-an; Xu, Yuhua

    2016-05-01

    The interaction topology among the constituents of a complex network plays a crucial role in the network’s evolutionary mechanisms and functional behaviors. However, some network topologies are usually unknown or uncertain. Meanwhile, coupling delays are ubiquitous in various man-made and natural networks. Hence, it is necessary to gain knowledge of the whole or partial topology of a complex dynamical network by taking into consideration communication delay. In this paper, topology identification of complex dynamical networks is investigated via generalized synchronization of a two-layer network. Particularly, based on the LaSalle-type invariance principle of stochastic differential delay equations, an adaptive control technique is proposed by constructing an auxiliary layer and designing proper control input and updating laws so that the unknown topology can be recovered upon successful generalized synchronization. Numerical simulations are provided to illustrate the effectiveness of the proposed method. The technique provides a certain theoretical basis for topology inference of complex networks. In particular, when the considered network is composed of systems with high-dimension or complicated dynamics, a simpler response layer can be constructed, which is conducive to circuit design. Moreover, it is practical to take into consideration perturbations caused by control input. Finally, the method is applicable to infer topology of a subnetwork embedded within a complex system and locate hidden sources. We hope the results can provide basic insight into further research endeavors on understanding practical and economical topology inference of networks.

  17. Prediction of silicon oxynitride plasma etching using a generalized regression neural network

    NASA Astrophysics Data System (ADS)

    Kim, Byungwhan; Lee, Byung Teak

    2005-08-01

    A prediction model of silicon oxynitride (SiON) etching was constructed using a neural network. Model prediction performance was improved by means of genetic algorithm. The etching was conducted in a C2F6 inductively coupled plasma. A 24 full factorial experiment was employed to systematically characterize parameter effects on SiON etching. The process parameters include radio frequency source power, bias power, pressure, and C2F6 flow rate. To test the appropriateness of the trained model, additional 16 experiments were conducted. For comparison, four types of statistical regression models were built. Compared to the best regression model, the optimized neural network model demonstrated an improvement of about 52%. The optimized model was used to infer etch mechanisms as a function of parameters. The pressure effect was noticeably large only as relatively large ion bombardment was maintained in the process chamber. Ion-bombardment-activated polymer deposition played the most significant role in interpreting the complex effect of bias power or C2F6 flow rate. Moreover, [CF2] was expected to be the predominant precursor to polymer deposition.

  18. Pathway analysis of high-throughput biological data within a Bayesian network framework.

    PubMed

    Isci, Senol; Ozturk, Cengizhan; Jones, Jon; Otu, Hasan H

    2011-06-15

    Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to causal inference. Here, we describe for the first time an algorithm that models biological pathways as BNs and identifies pathways that best explain given HTBD by scoring fitness of each network. Proposed method takes into account the connectivity and relatedness between nodes of the pathway through factoring pathway topology in its model. Our simulations using synthetic data demonstrated robustness of our approach. We tested proposed method, Bayesian Pathway Analysis (BPA), on human microarray data regarding renal cell carcinoma (RCC) and compared our results with gene set enrichment analysis. BPA was able to find broader and more specific pathways related to RCC. Accompanying BPA software (BPAS) package is freely available for academic use at http://bumil.boun.edu.tr/bpa.

  19. Memristor-Based Analog Computation and Neural Network Classification with a Dot Product Engine.

    PubMed

    Hu, Miao; Graves, Catherine E; Li, Can; Li, Yunning; Ge, Ning; Montgomery, Eric; Davila, Noraica; Jiang, Hao; Williams, R Stanley; Yang, J Joshua; Xia, Qiangfei; Strachan, John Paul

    2018-03-01

    Using memristor crossbar arrays to accelerate computations is a promising approach to efficiently implement algorithms in deep neural networks. Early demonstrations, however, are limited to simulations or small-scale problems primarily due to materials and device challenges that limit the size of the memristor crossbar arrays that can be reliably programmed to stable and analog values, which is the focus of the current work. High-precision analog tuning and control of memristor cells across a 128 × 64 array is demonstrated, and the resulting vector matrix multiplication (VMM) computing precision is evaluated. Single-layer neural network inference is performed in these arrays, and the performance compared to a digital approach is assessed. Memristor computing system used here reaches a VMM accuracy equivalent of 6 bits, and an 89.9% recognition accuracy is achieved for the 10k MNIST handwritten digit test set. Forecasts show that with integrated (on chip) and scaled memristors, a computational efficiency greater than 100 trillion operations per second per Watt is possible. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Inference of financial networks using the normalised mutual information rate.

    PubMed

    Goh, Yong Kheng; Hasim, Haslifah M; Antonopoulos, Chris G

    2018-01-01

    In this paper, we study data from financial markets, using the normalised Mutual Information Rate. We show how to use it to infer the underlying network structure of interrelations in the foreign currency exchange rates and stock indices of 15 currency areas. We first present the mathematical method and discuss its computational aspects, and apply it to artificial data from chaotic dynamics and to correlated normal-variates data. We then apply the method to infer the structure of the financial system from the time-series of currency exchange rates and stock indices. In particular, we study and reveal the interrelations among the various foreign currency exchange rates and stock indices in two separate networks, of which we also study their structural properties. Our results show that both inferred networks are small-world networks, sharing similar properties and having differences in terms of assortativity. Importantly, our work shows that global economies tend to connect with other economies world-wide, rather than creating small groups of local economies. Finally, the consistent interrelations depicted among the 15 currency areas are further supported by a discussion from the viewpoint of economics.

  1. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  2. Inference of financial networks using the normalised mutual information rate

    PubMed Central

    2018-01-01

    In this paper, we study data from financial markets, using the normalised Mutual Information Rate. We show how to use it to infer the underlying network structure of interrelations in the foreign currency exchange rates and stock indices of 15 currency areas. We first present the mathematical method and discuss its computational aspects, and apply it to artificial data from chaotic dynamics and to correlated normal-variates data. We then apply the method to infer the structure of the financial system from the time-series of currency exchange rates and stock indices. In particular, we study and reveal the interrelations among the various foreign currency exchange rates and stock indices in two separate networks, of which we also study their structural properties. Our results show that both inferred networks are small-world networks, sharing similar properties and having differences in terms of assortativity. Importantly, our work shows that global economies tend to connect with other economies world-wide, rather than creating small groups of local economies. Finally, the consistent interrelations depicted among the 15 currency areas are further supported by a discussion from the viewpoint of economics. PMID:29420644

  3. A sub-space greedy search method for efficient Bayesian Network inference.

    PubMed

    Zhang, Qing; Cao, Yong; Li, Yong; Zhu, Yanming; Sun, Samuel S M; Guo, Dianjing

    2011-09-01

    Bayesian network (BN) has been successfully used to infer the regulatory relationships of genes from microarray dataset. However, one major limitation of BN approach is the computational cost because the calculation time grows more than exponentially with the dimension of the dataset. In this paper, we propose a sub-space greedy search method for efficient Bayesian Network inference. Particularly, this method limits the greedy search space by only selecting gene pairs with higher partial correlation coefficients. Using both synthetic and real data, we demonstrate that the proposed method achieved comparable results with standard greedy search method yet saved ∼50% of the computational time. We believe that sub-space search method can be widely used for efficient BN inference in systems biology. Copyright © 2011 Elsevier Ltd. All rights reserved.

  4. Mobile Context Provider for Social Networking

    NASA Astrophysics Data System (ADS)

    Santos, André C.; Cardoso, João M. P.; Ferreira, Diogo R.; Diniz, Pedro C.

    The ability to infer user context based on a mobile device together with a set of external sensors opens up the way to new context-aware services and applications. In this paper, we describe a mobile context provider that makes use of sensors available in a smartphone as well as sensors externally connected via bluetooth. We describe the system architecture from sensor data acquisition to feature extraction, context inference and the publication of context information to well-known social networking services such as Twitter and Hi5. In the current prototype, context inference is based on decision trees, but the middleware allows the integration of other inference engines. Experimental results suggest that the proposed solution is a promising approach to provide user context to both local and network-level services.

  5. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency

    PubMed Central

    2009-01-01

    Background Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. Results To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage. Conclusions We provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. Our method is helpful in verifying possible interaction relations in gene regulatory networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks. Our method is also validated in several enriched published papers and databases and the significant gene regulatory networks perform critical biological functions and processes including cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction, and GO-annotated processes. Those significant gene regulations and the critical concept of tumor progression are useful to understand cancer biology and disease treatment. PMID:20025723

  6. Network inference and network response identification: moving genome-scale data to the next level of biological discovery

    PubMed Central

    Veiga, Diogo F. T.; Dutta, Bhaskar; Balaźsi, Gábor

    2011-01-01

    The escalating amount of genome-scale data demands a pragmatic stance from the research community. How can we utilize this deluge of information to better understand biology, cure diseases, or engage cells in bioremediation or biomaterial production for various purposes? A research pipeline moving new sequence, expression and binding data towards practical end goals seems to be necessary. While most individual researchers are not motivated by such well-articulated pragmatic end goals, the scientific community has already self-organized itself to successfully convert genomic data into fundamentally new biological knowledge and practical applications. Here we review two important steps in this workflow: network inference and network response identification, applied to transcriptional regulatory networks. Among network inference methods, we concentrate on relevance networks due to their conceptual simplicity. We classify and discuss network response identification approaches as either data-centric or network-centric. Finally, we conclude with an outlook on what is still missing from these approaches and what may be ahead on the road to biological discovery. PMID:20174676

  7. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  8. Analyzing the genes related to Alzheimer's disease via a network and pathway-based approach.

    PubMed

    Hu, Yan-Shi; Xin, Juncai; Hu, Ying; Zhang, Lei; Wang, Ju

    2017-04-27

    Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the relation between the pathways was explored by pathway crosstalk analysis. Furthermore, the network features of these AD-related genes were analyzed in the context of human interactome and an AD-specific network was inferred using the Steiner minimal tree algorithm. We compiled 430 human genes reported to be associated with AD from 823 publications. Biological theme analysis indicated that the biological processes and biochemical pathways related to neurodevelopment, metabolism, cell growth and/or survival, and immunology were enriched in these genes. Pathway crosstalk analysis then revealed that the significantly enriched pathways could be grouped into three interlinked modules-neuronal and metabolic module, cell growth/survival and neuroendocrine pathway module, and immune response-related module-indicating an AD-specific immune-endocrine-neuronal regulatory network. Furthermore, an AD-specific protein network was inferred and novel genes potentially associated with AD were identified. By means of network and pathway-based methodology, we explored the pathogenetic mechanism underlying AD at a systems biology level. Results from our work could provide valuable clues for understanding the molecular mechanism underlying AD. In addition, the framework proposed in this study could be used to investigate the pathological molecular network and genes relevant to other complex diseases or phenotypes.

  9. Mapping edge-based traffic measurements onto the internal links in MPLS network

    NASA Astrophysics Data System (ADS)

    Zhao, Guofeng; Tang, Hong; Zhang, Yi

    2004-09-01

    Applying multi-protocol label switching techniques to IP-based backbone for traffic engineering goals has shown advantageous. Obtaining a volume of load on each internal link of the network is crucial for traffic engineering applying. Though collecting can be available for each link, such as applying traditional SNMP scheme, the approach may cause heavy processing load and sharply degrade the throughput of the core routers. Then monitoring merely at the edge of the network and mapping the measurements onto the core provides a good alternative way. In this paper, we explore a scheme for traffic mapping with edge-based measurements in MPLS network. It is supposed that the volume of traffic on each internal link over the domain would be mapped onto by measurements available only at ingress nodes. We apply path-based measurements at ingress nodes without enabling measurements in the core of the network. We propose a method that can infer a path from the ingress to the egress node using label distribution protocol without collecting routing data from core routers. Based on flow theory and queuing theory, we prove that our approach is effective and present the algorithm for traffic mapping. We also show performance simulation results that indicate potential of our approach.

  10. Identity-by-Descent-Based Phasing and Imputation in Founder Populations Using Graphical Models

    PubMed Central

    Palin, Kimmo; Campbell, Harry; Wright, Alan F; Wilson, James F; Durbin, Richard

    2011-01-01

    Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc.35:853-860, 2011 PMID:22006673

  11. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks

    PubMed Central

    Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295

  12. A New Modified Histogram Matching Normalization for Time Series Microarray Analysis.

    PubMed

    Astola, Laura; Molenaar, Jaap

    2014-07-01

    Microarray data is often utilized in inferring regulatory networks. Quantile normalization (QN) is a popular method to reduce array-to-array variation. We show that in the context of time series measurements QN may not be the best choice for this task, especially not if the inference is based on continuous time ODE model. We propose an alternative normalization method that is better suited for network inference from time series data.

  13. Fragmenting networks by targeting collective influencers at a mesoscopic level.

    PubMed

    Kobayashi, Teruyoshi; Masuda, Naoki

    2016-11-25

    A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.

  14. Fragmenting networks by targeting collective influencers at a mesoscopic level

    NASA Astrophysics Data System (ADS)

    Kobayashi, Teruyoshi; Masuda, Naoki

    2016-11-01

    A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.

  15. Fragmenting networks by targeting collective influencers at a mesoscopic level

    PubMed Central

    Kobayashi, Teruyoshi; Masuda, Naoki

    2016-01-01

    A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure. PMID:27886251

  16. Underwater Sensor Network Redeployment Algorithm Based on Wolf Search

    PubMed Central

    Jiang, Peng; Feng, Yang; Wu, Feng

    2016-01-01

    This study addresses the optimization of node redeployment coverage in underwater wireless sensor networks. Given that nodes could easily become invalid under a poor environment and the large scale of underwater wireless sensor networks, an underwater sensor network redeployment algorithm was developed based on wolf search. This study is to apply the wolf search algorithm combined with crowded degree control in the deployment of underwater wireless sensor networks. The proposed algorithm uses nodes to ensure coverage of the events, and it avoids the prematurity of the nodes. The algorithm has good coverage effects. In addition, considering that obstacles exist in the underwater environment, nodes are prevented from being invalid by imitating the mechanism of avoiding predators. Thus, the energy consumption of the network is reduced. Comparative analysis shows that the algorithm is simple and effective in wireless sensor network deployment. Compared with the optimized artificial fish swarm algorithm, the proposed algorithm exhibits advantages in network coverage, energy conservation, and obstacle avoidance. PMID:27775659

  17. Community detection in complex networks by using membrane algorithm

    NASA Astrophysics Data System (ADS)

    Liu, Chuang; Fan, Linan; Liu, Zhou; Dai, Xiang; Xu, Jiamei; Chang, Baoren

    Community detection in complex networks is a key problem of network analysis. In this paper, a new membrane algorithm is proposed to solve the community detection in complex networks. The proposed algorithm is based on membrane systems, which consists of objects, reaction rules, and a membrane structure. Each object represents a candidate partition of a complex network, and the quality of objects is evaluated according to network modularity. The reaction rules include evolutionary rules and communication rules. Evolutionary rules are responsible for improving the quality of objects, which employ the differential evolutionary algorithm to evolve objects. Communication rules implement the information exchanged among membranes. Finally, the proposed algorithm is evaluated on synthetic, real-world networks with real partitions known and the large-scaled networks with real partitions unknown. The experimental results indicate the superior performance of the proposed algorithm in comparison with other experimental algorithms.

  18. Genome-Wide Networks of Amino Acid Covariances Are Common among Viruses

    PubMed Central

    Donlin, Maureen J.; Szeto, Brandon; Gohara, David W.; Aurora, Rajeev

    2012-01-01

    Coordinated variation among positions in amino acid sequence alignments can reveal genetic dependencies at noncontiguous positions, but methods to assess these interactions are incompletely developed. Previously, we found genome-wide networks of covarying residue positions in the hepatitis C virus genome (R. Aurora, M. J. Donlin, N. A. Cannon, and J. E. Tavis, J. Clin. Invest. 119:225–236, 2009). Here, we asked whether such networks are present in a diverse set of viruses and, if so, what they may imply about viral biology. Viral sequences were obtained for 16 viruses in 13 species from 9 families. The entire viral coding potential for each virus was aligned, all possible amino acid covariances were identified using the observed-minus-expected-squared algorithm at a false-discovery rate of ≤1%, and networks of covariances were assessed using standard methods. Covariances that spanned the viral coding potential were common in all viruses. In all cases, the covariances formed a single network that contained essentially all of the covariances. The hepatitis C virus networks had hub-and-spoke topologies, but all other networks had random topologies with an unusually large number of highly connected nodes. These results indicate that genome-wide networks of genetic associations and the coordinated evolution they imply are very common in viral genomes, that the networks rarely have the hub-and-spoke topology that dominates other biological networks, and that network topologies can vary substantially even within a given viral group. Five examples with hepatitis B virus and poliovirus are presented to illustrate how covariance network analysis can lead to inferences about viral biology. PMID:22238298

  19. Control Algorithms For Liquid-Cooled Garments

    NASA Technical Reports Server (NTRS)

    Drew, B.; Harner, K.; Hodgson, E.; Homa, J.; Jennings, D.; Yanosy, J.

    1988-01-01

    Three algorithms developed for control of cooling in protective garments. Metabolic rate inferred from temperatures of cooling liquid outlet and inlet, suitably filtered to account for thermal lag of human body. Temperature at inlet adjusted to value giving maximum comfort at inferred metabolic rate. Applicable to space suits, used for automatic control of cooling in suits worn by workers in radioactive, polluted, or otherwise hazardous environments. More effective than manual control, subject to frequent, overcompensated adjustments as level of activity varies.

  20. Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm. | Office of Cancer Genomics

    Cancer.gov

    We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm. Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined.

Top