GeNets: a unified web platform for network-based genomic analyses.
Li, Taibo; Kim, April; Rosenbluh, Joseph; Horn, Heiko; Greenfeld, Liraz; An, David; Zimmer, Andrew; Liberzon, Arthur; Bistline, Jon; Natoli, Ted; Li, Yang; Tsherniak, Aviad; Narayan, Rajiv; Subramanian, Aravind; Liefeld, Ted; Wong, Bang; Thompson, Dawn; Calvo, Sarah; Carr, Steve; Boehm, Jesse; Jaffe, Jake; Mesirov, Jill; Hacohen, Nir; Regev, Aviv; Lage, Kasper
2018-06-18
Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
Hamilton, Joshua J; Reed, Jennifer L
2012-01-01
Genome-scale network reconstructions are useful tools for understanding cellular metabolism, and comparisons of such reconstructions can provide insight into metabolic differences between organisms. Recent efforts toward comparing genome-scale models have focused primarily on aligning metabolic networks at the reaction level and then looking at differences and similarities in reaction and gene content. However, these reaction comparison approaches are time-consuming and do not identify the effect network differences have on the functional states of the network. We have developed a bilevel mixed-integer programming approach, CONGA, to identify functional differences between metabolic networks by comparing network reconstructions aligned at the gene level. We first identify orthologous genes across two reconstructions and then use CONGA to identify conditions under which differences in gene content give rise to differences in metabolic capabilities. By seeking genes whose deletion in one or both models disproportionately changes flux through a selected reaction (e.g., growth or by-product secretion) in one model over another, we are able to identify structural metabolic network differences enabling unique metabolic capabilities. Using CONGA, we explore functional differences between two metabolic reconstructions of Escherichia coli and identify a set of reactions responsible for chemical production differences between the two models. We also use this approach to aid in the development of a genome-scale model of Synechococcus sp. PCC 7002. Finally, we propose potential antimicrobial targets in Mycobacterium tuberculosis and Staphylococcus aureus based on differences in their metabolic capabilities. Through these examples, we demonstrate that a gene-centric approach to comparing metabolic networks allows for a rapid comparison of metabolic models at a functional level. Using CONGA, we can identify differences in reaction and gene content which give rise to different functional predictions. Because CONGA provides a general framework, it can be applied to find functional differences across models and biological systems beyond those presented here.
Hamilton, Joshua J.; Reed, Jennifer L.
2012-01-01
Genome-scale network reconstructions are useful tools for understanding cellular metabolism, and comparisons of such reconstructions can provide insight into metabolic differences between organisms. Recent efforts toward comparing genome-scale models have focused primarily on aligning metabolic networks at the reaction level and then looking at differences and similarities in reaction and gene content. However, these reaction comparison approaches are time-consuming and do not identify the effect network differences have on the functional states of the network. We have developed a bilevel mixed-integer programming approach, CONGA, to identify functional differences between metabolic networks by comparing network reconstructions aligned at the gene level. We first identify orthologous genes across two reconstructions and then use CONGA to identify conditions under which differences in gene content give rise to differences in metabolic capabilities. By seeking genes whose deletion in one or both models disproportionately changes flux through a selected reaction (e.g., growth or by-product secretion) in one model over another, we are able to identify structural metabolic network differences enabling unique metabolic capabilities. Using CONGA, we explore functional differences between two metabolic reconstructions of Escherichia coli and identify a set of reactions responsible for chemical production differences between the two models. We also use this approach to aid in the development of a genome-scale model of Synechococcus sp. PCC 7002. Finally, we propose potential antimicrobial targets in Mycobacterium tuberculosis and Staphylococcus aureus based on differences in their metabolic capabilities. Through these examples, we demonstrate that a gene-centric approach to comparing metabolic networks allows for a rapid comparison of metabolic models at a functional level. Using CONGA, we can identify differences in reaction and gene content which give rise to different functional predictions. Because CONGA provides a general framework, it can be applied to find functional differences across models and biological systems beyond those presented here. PMID:22666308
Integrated Approach to Reconstruction of Microbial Regulatory Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodionov, Dmitry A; Novichkov, Pavel S
2013-11-04
This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated inmore » RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.« less
Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.; ...
2014-10-16
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genesmore » and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.« less
Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.; Chia, Nicholas; Price, Nathan D.
2014-01-01
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface. PMID:25329157
Weighill, Deborah A.; Jacobson, Daniel A.
2015-03-27
Herein we present and develop the theory of 3-way networks, a type of hypergraph in which each edge models relationships between triplets of objects as opposed to pairs of objects as done by standard network models. We explore approaches of how to prune these 3-way networks, illustrate their utility in comparative genomics and demonstrate how they find relationships which would be missed by standard 2-way network models using a phylogenomic dataset of 211 bacterial genomes.
Weighill, Deborah A; Jacobson, Daniel A
2015-01-01
We present and develop the theory of 3-way networks, a type of hypergraph in which each edge models relationships between triplets of objects as opposed to pairs of objects as done by standard network models. We explore approaches of how to prune these 3-way networks, illustrate their utility in comparative genomics and demonstrate how they find relationships which would be missed by standard 2-way network models using a phylogenomic dataset of 211 bacterial genomes. PMID:25815802
Constructing an integrated gene similarity network for the identification of disease genes.
Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin
2017-09-20
Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
2012-01-01
Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261
Toward the automated generation of genome-scale metabolic networks in the SEED.
DeJongh, Matthew; Formsma, Kevin; Boillot, Paul; Gould, John; Rycenga, Matthew; Best, Aaron
2007-04-26
Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis. Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...
2015-04-09
Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.
2015-01-01
Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308
Network-based machine learning and graph theory algorithms for precision oncology.
Zhang, Wei; Chien, Jeremy; Yong, Jeongsik; Kuang, Rui
2017-01-01
Network-based analytics plays an increasingly important role in precision oncology. Growing evidence in recent studies suggests that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual mutations and that the efficacy of repositioned drugs can be inferred from disease modules in molecular networks. This article reviews network-based machine learning and graph theory algorithms for integrative analysis of personal genomic data and biomedical knowledge bases to identify tumor-specific molecular mechanisms, candidate targets and repositioned drugs for personalized treatment. The review focuses on the algorithmic design and mathematical formulation of these methods to facilitate applications and implementations of network-based analysis in the practice of precision oncology. We review the methods applied in three scenarios to integrate genomic data and network models in different analysis pipelines, and we examine three categories of network-based approaches for repositioning drugs in drug-disease-gene networks. In addition, we perform a comprehensive subnetwork/pathway analysis of mutations in 31 cancer genome projects in the Cancer Genome Atlas and present a detailed case study on ovarian cancer. Finally, we discuss interesting observations, potential pitfalls and future directions in network-based precision oncology.
ITEP: an integrated toolkit for exploration of microbial pan-genomes.
Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D
2014-01-03
Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang
2016-12-23
A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .
Comparing Mycobacterium tuberculosis genomes using genome topology networks.
Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan
2015-02-14
Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.
Metabolism and evolution: A comparative study of reconstructed genome-level metabolic networks
NASA Astrophysics Data System (ADS)
Almaas, Eivind
2008-03-01
The availability of high-quality annotations of sequenced genomes has made it possible to generate organism-specific comprehensive maps of cellular metabolism. Currently, more than twenty such metabolic reconstructions are publicly available, with the majority focused on bacteria. A typical metabolic reconstruction for a bacterium results in a complex network containing hundreds of metabolites (nodes) and reactions (links), while some even contain more than a thousand. The constrain-based optimization approach of flux-balance analysis (FBA) is used to investigate the functional characteristics of such large-scale metabolic networks, making it possible to estimate an organism's growth behavior in a wide variety of nutrient environments, as well as its robustness to gene loss. We have recently completed the genome-level metabolic reconstruction of Yersinia pseudotuberculosis, as well as the three Yersinia pestis biovars Antiqua, Mediaevalis, and Orientalis. While Y. pseudotuberculosis typically only causes fever and abdominal pain that can mimic appendicitis, the evolutionary closely related Y. pestis strains are the aetiological agents of the bubonic plague. In this presentation, I will discuss our results and conclusions from a comparative study on the evolution of metabolic function in the four Yersiniae networks using FBA and related techniques, and I will give particular focus to the interplay between metabolic network topology and evolutionary flexibility.
The Use of Weighted Graphs for Large-Scale Genome Analysis
Zhou, Fang; Toivonen, Hannu; King, Ross D.
2014-01-01
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061
Descriptive vs. mechanistic network models in plant development in the post-genomic era.
Davila-Velderrain, J; Martinez-Garcia, J C; Alvarez-Buylla, E R
2015-01-01
Network modeling is now a widespread practice in systems biology, as well as in integrative genomics, and it constitutes a rich and diverse scientific research field. A conceptually clear understanding of the reasoning behind the main existing modeling approaches, and their associated technical terminologies, is required to avoid confusions and accelerate the transition towards an undeniable necessary more quantitative, multidisciplinary approach to biology. Herein, we focus on two main network-based modeling approaches that are commonly used depending on the information available and the intended goals: inference-based methods and system dynamics approaches. As far as data-based network inference methods are concerned, they enable the discovery of potential functional influences among molecular components. On the other hand, experimentally grounded network dynamical models have been shown to be perfectly suited for the mechanistic study of developmental processes. How do these two perspectives relate to each other? In this chapter, we describe and compare both approaches and then apply them to a given specific developmental module. Along with the step-by-step practical implementation of each approach, we also focus on discussing their respective goals, utility, assumptions, and associated limitations. We use the gene regulatory network (GRN) involved in Arabidopsis thaliana Root Stem Cell Niche patterning as our illustrative example. We show that descriptive models based on functional genomics data can provide important background information consistent with experimentally supported functional relationships integrated in mechanistic GRN models. The rationale of analysis and modeling can be applied to any other well-characterized functional developmental module in multicellular organisms, like plants and animals.
Genomic connectivity networks based on the BrainSpan atlas of the developing human brain
NASA Astrophysics Data System (ADS)
Mahfouz, Ahmed; Ziats, Mark N.; Rennert, Owen M.; Lelieveldt, Boudewijn P. F.; Reinders, Marcel J. T.
2014-03-01
The human brain comprises systems of networks that span the molecular, cellular, anatomic and functional levels. Molecular studies of the developing brain have focused on elucidating networks among gene products that may drive cellular brain development by functioning together in biological pathways. On the other hand, studies of the brain connectome attempt to determine how anatomically distinct brain regions are connected to each other, either anatomically (diffusion tensor imaging) or functionally (functional MRI and EEG), and how they change over development. A global examination of the relationship between gene expression and connectivity in the developing human brain is necessary to understand how the genetic signature of different brain regions instructs connections to other regions. Furthermore, analyzing the development of connectivity networks based on the spatio-temporal dynamics of gene expression provides a new insight into the effect of neurodevelopmental disease genes on brain networks. In this work, we construct connectivity networks between brain regions based on the similarity of their gene expression signature, termed "Genomic Connectivity Networks" (GCNs). Genomic connectivity networks were constructed using data from the BrainSpan Transcriptional Atlas of the Developing Human Brain. Our goal was to understand how the genetic signatures of anatomically distinct brain regions relate to each other across development. We assessed the neurodevelopmental changes in connectivity patterns of brain regions when networks were constructed with genes implicated in the neurodevelopmental disorder autism (autism spectrum disorder; ASD). Using graph theory metrics to characterize the GCNs, we show that ASD-GCNs are relatively less connected later in development with the cerebellum showing a very distinct expression of ASD-associated genes compared to other brain regions.
Baumbach, Jan; Brinkrolf, Karina; Czaja, Lisa F; Rahmann, Sven; Tauch, Andreas
2006-02-14
The application of DNA microarray technology in post-genomic analysis of bacterial genome sequences has allowed the generation of huge amounts of data related to regulatory networks. This data along with literature-derived knowledge on regulation of gene expression has opened the way for genome-wide reconstruction of transcriptional regulatory networks. These large-scale reconstructions can be converted into in silico models of bacterial cells that allow a systematic analysis of network behavior in response to changing environmental conditions. CoryneRegNet was designed to facilitate the genome-wide reconstruction of transcriptional regulatory networks of corynebacteria relevant in biotechnology and human medicine. During the import and integration process of data derived from experimental studies or literature knowledge CoryneRegNet generates links to genome annotations, to identified transcription factors and to the corresponding cis-regulatory elements. CoryneRegNet is based on a multi-layered, hierarchical and modular concept of transcriptional regulation and was implemented by using the relational database management system MySQL and an ontology-based data structure. Reconstructed regulatory networks can be visualized by using the yFiles JAVA graph library. As an application example of CoryneRegNet, we have reconstructed the global transcriptional regulation of a cellular module involved in SOS and stress response of corynebacteria. CoryneRegNet is an ontology-based data warehouse that allows a pertinent data management of regulatory interactions along with the genome-scale reconstruction of transcriptional regulatory networks. These models can further be combined with metabolic networks to build integrated models of cellular function including both metabolism and its transcriptional regulation.
Optimal knockout strategies in genome-scale metabolic networks using particle swarm optimization.
Nair, Govind; Jungreuthmayer, Christian; Zanghellini, Jürgen
2017-02-01
Knockout strategies, particularly the concept of constrained minimal cut sets (cMCSs), are an important part of the arsenal of tools used in manipulating metabolic networks. Given a specific design, cMCSs can be calculated even in genome-scale networks. We would however like to find not only the optimal intervention strategy for a given design but the best possible design too. Our solution (PSOMCS) is to use particle swarm optimization (PSO) along with the direct calculation of cMCSs from the stoichiometric matrix to obtain optimal designs satisfying multiple objectives. To illustrate the working of PSOMCS, we apply it to a toy network. Next we show its superiority by comparing its performance against other comparable methods on a medium sized E. coli core metabolic network. PSOMCS not only finds solutions comparable to previously published results but also it is orders of magnitude faster. Finally, we use PSOMCS to predict knockouts satisfying multiple objectives in a genome-scale metabolic model of E. coli and compare it with OptKnock and RobustKnock. PSOMCS finds competitive knockout strategies and designs compared to other current methods and is in some cases significantly faster. It can be used in identifying knockouts which will force optimal desired behaviors in large and genome scale metabolic networks. It will be even more useful as larger metabolic models of industrially relevant organisms become available.
Ulas, Thomas; Riemer, S. Alexander; Zaparty, Melanie; Siebers, Bettina; Schomburg, Dietmar
2012-01-01
We describe the reconstruction of a genome-scale metabolic model of the crenarchaeon Sulfolobus solfataricus, a hyperthermoacidophilic microorganism. It grows in terrestrial volcanic hot springs with growth occurring at pH 2–4 (optimum 3.5) and a temperature of 75–80°C (optimum 80°C). The genome of Sulfolobus solfataricus P2 contains 2,992,245 bp on a single circular chromosome and encodes 2,977 proteins and a number of RNAs. The network comprises 718 metabolic and 58 transport/exchange reactions and 705 unique metabolites, based on the annotated genome and available biochemical data. Using the model in conjunction with constraint-based methods, we simulated the metabolic fluxes induced by different environmental and genetic conditions. The predictions were compared to experimental measurements and phenotypes of S. solfataricus. Furthermore, the performance of the network for 35 different carbon sources known for S. solfataricus from the literature was simulated. Comparing the growth on different carbon sources revealed that glycerol is the carbon source with the highest biomass flux per imported carbon atom (75% higher than glucose). Experimental data was also used to fit the model to phenotypic observations. In addition to the commonly known heterotrophic growth of S. solfataricus, the crenarchaeon is also able to grow autotrophically using the hydroxypropionate-hydroxybutyrate cycle for bicarbonate fixation. We integrated this pathway into our model and compared bicarbonate fixation with growth on glucose as sole carbon source. Finally, we tested the robustness of the metabolism with respect to gene deletions using the method of Minimization of Metabolic Adjustment (MOMA), which predicted that 18% of all possible single gene deletions would be lethal for the organism. PMID:22952675
Systems biology-based approaches toward understanding drought tolerance in food crops.
Jogaiah, Sudisha; Govind, Sharathchandra Ramsandra; Tran, Lam-Son Phan
2013-03-01
Economically important crops, such as maize, wheat, rice, barley, and other food crops are affected by even small changes in water potential at important growth stages. Developing a comprehensive understanding of host response to drought requires a global view of the complex mechanisms involved. Research on drought tolerance has generally been conducted using discipline-specific approaches. However, plant stress response is complex and interlinked to a point where discipline-specific approaches do not give a complete global analysis of all the interlinked mechanisms. Systems biology perspective is needed to understand genome-scale networks required for building long-lasting drought resistance. Network maps have been constructed by integrating multiple functional genomics data with both model plants, such as Arabidopsis thaliana, Lotus japonicus, and Medicago truncatula, and various food crops, such as rice and soybean. Useful functional genomics data have been obtained from genome-wide comparative transcriptome and proteome analyses of drought responses from different crops. This integrative approach used by many groups has led to identification of commonly regulated signaling pathways and genes following exposure to drought. Combination of functional genomics and systems biology is very useful for comparative analysis of other food crops and has the ability to develop stable food systems worldwide. In addition, studying desiccation tolerance in resurrection plants will unravel how combination of molecular genetic and metabolic processes interacts to produce a resurrection phenotype. Systems biology-based approaches have helped in understanding how these individual factors and mechanisms (biochemical, molecular, and metabolic) "interact" spatially and temporally. Signaling network maps of such interactions are needed that can be used to design better engineering strategies for improving drought tolerance of important crop species.
Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis
2013-08-01
Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Avsec, Žiga; Cheng, Jun; Gagneur, Julien
2018-01-01
Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928
Network-assisted crop systems genetics: network inference and integrative analysis.
Lee, Tak; Kim, Hyojin; Lee, Insuk
2015-04-01
Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years. Copyright © 2015 Elsevier Ltd. All rights reserved.
Baumbach, Jan; Brinkrolf, Karina; Czaja, Lisa F; Rahmann, Sven; Tauch, Andreas
2006-01-01
Background The application of DNA microarray technology in post-genomic analysis of bacterial genome sequences has allowed the generation of huge amounts of data related to regulatory networks. This data along with literature-derived knowledge on regulation of gene expression has opened the way for genome-wide reconstruction of transcriptional regulatory networks. These large-scale reconstructions can be converted into in silico models of bacterial cells that allow a systematic analysis of network behavior in response to changing environmental conditions. Description CoryneRegNet was designed to facilitate the genome-wide reconstruction of transcriptional regulatory networks of corynebacteria relevant in biotechnology and human medicine. During the import and integration process of data derived from experimental studies or literature knowledge CoryneRegNet generates links to genome annotations, to identified transcription factors and to the corresponding cis-regulatory elements. CoryneRegNet is based on a multi-layered, hierarchical and modular concept of transcriptional regulation and was implemented by using the relational database management system MySQL and an ontology-based data structure. Reconstructed regulatory networks can be visualized by using the yFiles JAVA graph library. As an application example of CoryneRegNet, we have reconstructed the global transcriptional regulation of a cellular module involved in SOS and stress response of corynebacteria. Conclusion CoryneRegNet is an ontology-based data warehouse that allows a pertinent data management of regulatory interactions along with the genome-scale reconstruction of transcriptional regulatory networks. These models can further be combined with metabolic networks to build integrated models of cellular function including both metabolism and its transcriptional regulation. PMID:16478536
Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M.; Brown, Eric W.; Timme, Ruth
2016-01-01
The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. PMID:27008877
Genomic Approaches to Zebrafish Cancer
2017-01-01
The zebrafish has emerged as an important model for studying cancer biology. Identification of DNA, RNA and chromatin abnormalities can give profound insight into the mechanisms of tumorigenesis and the there are many techniques for analyzing the genomes of these tumors. Here, I present an overview of the available technologies for analyzing tumor genomes in the zebrafish, including array based methods as well as next-generation sequencing technologies. I also discuss the ways in which zebrafish tumor genomes can be compared to human genomes using cross-species oncogenomics, which act to filter genomic noise and ultimately uncover central drivers of malignancy. Finally, I discuss downstream analytic tools, including network analysis, that can help to organize the alterations into coherent biological frameworks that can then be investigated further. PMID:27165352
Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas
2015-01-01
Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
A dictionary based informational genome analysis
2012-01-01
Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068
Comparative analysis of gene regulatory networks: from network reconstruction to evolution.
Thompson, Dawn; Regev, Aviv; Roy, Sushmita
2015-01-01
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Recapitulating phylogenies using k-mers: from trees to networks.
Bernard, Guillaume; Ragan, Mark A; Chan, Cheong Xin
2016-01-01
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k -mers (subsequences at fixed length k ). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k -mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth
2016-08-01
The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks.
Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu
2016-06-01
Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. QuIN's web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.
Dimitrova, N; Nagaraj, A B; Razi, A; Singh, S; Kamalakaran, S; Banerjee, N; Joseph, P; Mankovich, A; Mittal, P; DiFeo, A; Varadan, V
2017-04-27
Characterizing the complex interplay of cellular processes in cancer would enable the discovery of key mechanisms underlying its development and progression. Published approaches to decipher driver mechanisms do not explicitly model tissue-specific changes in pathway networks and the regulatory disruptions related to genomic aberrations in cancers. We therefore developed InFlo, a novel systems biology approach for characterizing complex biological processes using a unique multidimensional framework integrating transcriptomic, genomic and/or epigenomic profiles for any given cancer sample. We show that InFlo robustly characterizes tissue-specific differences in activities of signalling networks on a genome scale using unique probabilistic models of molecular interactions on a per-sample basis. Using large-scale multi-omics cancer datasets, we show that InFlo exhibits higher sensitivity and specificity in detecting pathway networks associated with specific disease states when compared to published pathway network modelling approaches. Furthermore, InFlo's ability to infer the activity of unmeasured signalling network components was also validated using orthogonal gene expression signatures. We then evaluated multi-omics profiles of primary high-grade serous ovarian cancer tumours (N=357) to delineate mechanisms underlying resistance to frontline platinum-based chemotherapy. InFlo was the only algorithm to identify hyperactivation of the cAMP-CREB1 axis as a key mechanism associated with resistance to platinum-based therapy, a finding that we subsequently experimentally validated. We confirmed that inhibition of CREB1 phosphorylation potently sensitized resistant cells to platinum therapy and was effective in killing ovarian cancer stem cells that contribute to both platinum-resistance and tumour recurrence. Thus, we propose InFlo to be a scalable and widely applicable and robust integrative network modelling framework for the discovery of evidence-based biomarkers and therapeutic targets.
Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E
2008-01-15
To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.
Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.
2008-01-01
To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818
2014-01-01
Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods. PMID:25374614
A mixed-integer linear programming approach to the reduction of genome-scale metabolic networks.
Röhl, Annika; Bockmayr, Alexander
2017-01-03
Constraint-based analysis has become a widely used method to study metabolic networks. While some of the associated algorithms can be applied to genome-scale network reconstructions with several thousands of reactions, others are limited to small or medium-sized models. In 2015, Erdrich et al. introduced a method called NetworkReducer, which reduces large metabolic networks to smaller subnetworks, while preserving a set of biological requirements that can be specified by the user. Already in 2001, Burgard et al. developed a mixed-integer linear programming (MILP) approach for computing minimal reaction sets under a given growth requirement. Here we present an MILP approach for computing minimum subnetworks with the given properties. The minimality (with respect to the number of active reactions) is not guaranteed by NetworkReducer, while the method by Burgard et al. does not allow specifying the different biological requirements. Our procedure is about 5-10 times faster than NetworkReducer and can enumerate all minimum subnetworks in case there exist several ones. This allows identifying common reactions that are present in all subnetworks, and reactions appearing in alternative pathways. Applying complex analysis methods to genome-scale metabolic networks is often not possible in practice. Thus it may become necessary to reduce the size of the network while keeping important functionalities. We propose a MILP solution to this problem. Compared to previous work, our approach is more efficient and allows computing not only one, but even all minimum subnetworks satisfying the required properties.
Genome network medicine: innovation to overcome huge challenges in cancer therapy.
Roukos, Dimitrios H
2014-01-01
The post-ENCODE era shapes now a new biomedical research direction for understanding transcriptional and signaling networks driving gene expression and core cellular processes such as cell fate, survival, and apoptosis. Over the past half century, the Francis Crick 'central dogma' of single n gene/protein-phenotype (trait/disease) has defined biology, human physiology, disease, diagnostics, and drugs discovery. However, the ENCODE project and several other genomic studies using high-throughput sequencing technologies, computational strategies, and imaging techniques to visualize regulatory networks, provide evidence that transcriptional process and gene expression are regulated by highly complex dynamic molecular and signaling networks. This Focus article describes the linear experimentation-based limitations of diagnostics and therapeutics to cure advanced cancer and the need to move on from reductionist to network-based approaches. With evident a wide genomic heterogeneity, the power and challenges of next-generation sequencing (NGS) technologies to identify a patient's personal mutational landscape for tailoring the best target drugs in the individual patient are discussed. However, the available drugs are not capable of targeting aberrant signaling networks and research on functional transcriptional heterogeneity and functional genome organization is poorly understood. Therefore, the future clinical genome network medicine aiming at overcoming multiple problems in the new fields of regulatory DNA mapping, noncoding RNA, enhancer RNAs, and dynamic complexity of transcriptional circuitry are also discussed expecting in new innovation technology and strong appreciation of clinical data and evidence-based medicine. The problematic and potential solutions in the discovery of next-generation, molecular, and signaling circuitry-based biomarkers and drugs are explored. © 2013 Wiley Periodicals, Inc.
BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions
2010-01-01
Background Genome-scale metabolic reconstructions under the Constraint Based Reconstruction and Analysis (COBRA) framework are valuable tools for analyzing the metabolic capabilities of organisms and interpreting experimental data. As the number of such reconstructions and analysis methods increases, there is a greater need for data uniformity and ease of distribution and use. Description We describe BiGG, a knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different organisms. BiGG can be used to browse model content, visualize metabolic pathway maps, and export SBML files of the models for further analysis by external software packages. Users may follow links from BiGG to several external databases to obtain additional information on genes, proteins, reactions, metabolites and citations of interest. Conclusions BiGG addresses a need in the systems biology community to have access to high quality curated metabolic models and reconstructions. It is freely available for academic use at http://bigg.ucsd.edu. PMID:20426874
Cyanobacterial Biofuels: Strategies and Developments on Network and Modeling.
Klanchui, Amornpan; Raethong, Nachon; Prommeenate, Peerada; Vongsangnak, Wanwipa; Meechai, Asawin
Cyanobacteria, the phototrophic microorganisms, have attracted much attention recently as a promising source for environmentally sustainable biofuels production. However, barriers for commercial markets of cyanobacteria-based biofuels concern the economic feasibility. Miscellaneous strategies for improving the production performance of cyanobacteria have thus been developed. Among these, the simple ad hoc strategies resulting in failure to optimize fully cell growth coupled with desired product yield are explored. With the advancement of genomics and systems biology, a new paradigm toward systems metabolic engineering has been recognized. In particular, a genome-scale metabolic network reconstruction and modeling is a crucial systems-based tool for whole-cell-wide investigation and prediction. In this review, the cyanobacterial genome-scale metabolic models, which offer a system-level understanding of cyanobacterial metabolism, are described. The main process of metabolic network reconstruction and modeling of cyanobacteria are summarized. Strategies and developments on genome-scale network and modeling through the systems metabolic engineering approach are advanced and employed for efficient cyanobacterial-based biofuels production.
Conserved noncoding sequences conserve biological networks and influence genome evolution.
Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang
2018-05-01
Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.
RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.
Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A
2013-11-01
Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks
Thibodeau, Asa; Márquez, Eladio J.; Luo, Oscar; Ruan, Yijun; Shin, Dong-Guk; Stitzel, Michael L.; Ucar, Duygu
2016-01-01
Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/. PMID:27336171
Kweon, Ohgew; Kim, Seong-Jae; Blom, Jochen; Kim, Sung-Kwan; Kim, Bong-Soo; Baek, Dong-Heon; Park, Su Inn; Sutherland, John B; Cerniglia, Carl E
2015-02-14
The bacterial genus Mycobacterium is of great interest in the medical and biotechnological fields. Despite a flood of genome sequencing and functional genomics data, significant gaps in knowledge between genome and phenome seriously hinder efforts toward the treatment of mycobacterial diseases and practical biotechnological applications. In this study, we propose the use of systematic, comparative functional pan-genomic analysis to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon (PAH) metabolism in the genus Mycobacterium. Phylogenetic, phenotypic, and genomic information for 27 completely genome-sequenced mycobacteria was systematically integrated to reconstruct a mycobacterial phenotype network (MPN) with a pan-genomic concept at a network level. In the MPN, mycobacterial phenotypes show typical scale-free relationships. PAH degradation is an isolated phenotype with the lowest connection degree, consistent with phylogenetic and environmental isolation of PAH degraders. A series of functional pan-genomic analyses provide conserved and unique types of genomic evidence for strong epistatic and pleiotropic impacts on evolutionary trajectories of the PAH-degrading phenotype. Under strong natural selection, the detailed gene gain/loss patterns from horizontal gene transfer (HGT)/deletion events hypothesize a plausible evolutionary path, an epistasis-based birth and pleiotropy-dependent death, for PAH metabolism in the genus Mycobacterium. This study generated a practical mycobacterial compendium of phenotypic and genomic changes, focusing on the PAH-degrading phenotype, with a pan-genomic perspective of the evolutionary events and the environmental challenges. Our findings suggest that when selection acts on PAH metabolism, only a small fraction of possible trajectories is likely to be observed, owing mainly to a combination of the ambiguous phenotypic effects of PAHs and the corresponding pleiotropy- and epistasis-dependent evolutionary adaptation. Evolutionary constraints on the selection of trajectories, like those seen in PAH-degrading phenotypes, are likely to apply to the evolution of other phenotypes in the genus Mycobacterium.
The IGNITE network: a model for genomic medicine implementation and research.
Weitzel, Kristin Wiisanen; Alexander, Madeline; Bernhardt, Barbara A; Calman, Neil; Carey, David J; Cavallari, Larisa H; Field, Julie R; Hauser, Diane; Junkins, Heather A; Levin, Phillip A; Levy, Kenneth; Madden, Ebony B; Manolio, Teri A; Odgis, Jacqueline; Orlando, Lori A; Pyeritz, Reed; Wu, R Ryanne; Shuldiner, Alan R; Bottinger, Erwin P; Denny, Joshua C; Dexter, Paul R; Flockhart, David A; Horowitz, Carol R; Johnson, Julie A; Kimmel, Stephen E; Levy, Mia A; Pollin, Toni I; Ginsburg, Geoffrey S
2016-01-05
Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility. To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; www.ignite-genomics.org ) Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches. This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years. The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.
Robinson, Sean; Nevalainen, Jaakko; Pinna, Guillaume; Campalans, Anna; Radicella, J. Pablo; Guyon, Laurent
2017-01-01
Abstract Motivation: Incorporating gene interaction data into the identification of ‘hit’ genes in genomic experiments is a well-established approach leveraging the ‘guilt by association’ assumption to obtain a network based hit list of functionally related genes. We aim to develop a method to allow for multivariate gene scores and multiple hit labels in order to extend the analysis of genomic screening data within such an approach. Results: We propose a Markov random field-based method to achieve our aim and show that the particular advantages of our method compared with those currently used lead to new insights in previously analysed data as well as for our own motivating data. Our method additionally achieves the best performance in an independent simulation experiment. The real data applications we consider comprise of a survival analysis and differential expression experiment and a cell-based RNA interference functional screen. Availability and implementation: We provide all of the data and code related to the results in the paper. Contact: sean.j.robinson@utu.fi or laurent.guyon@cea.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28881978
Whole-genome sequence of Schistosoma haematobium.
Young, Neil D; Jex, Aaron R; Li, Bo; Liu, Shiping; Yang, Linfeng; Xiong, Zijun; Li, Yingrui; Cantacessi, Cinzia; Hall, Ross S; Xu, Xun; Chen, Fangyuan; Wu, Xuan; Zerlotini, Adhemar; Oliveira, Guilherme; Hofmann, Andreas; Zhang, Guojie; Fang, Xiaodong; Kang, Yi; Campbell, Bronwyn E; Loukas, Alex; Ranganathan, Shoba; Rollinson, David; Rinaldi, Gabriel; Brindley, Paul J; Yang, Huanming; Wang, Jun; Wang, Jian; Gasser, Robin B
2012-01-15
Schistosomiasis is a neglected tropical disease caused by blood flukes (genus Schistosoma; schistosomes) and affecting 200 million people worldwide. No vaccines are available, and treatment relies on one drug, praziquantel. Schistosoma haematobium has come into the spotlight as a major cause of urogenital disease, as an agent linked to bladder cancer and as a predisposing factor for HIV/AIDS. The parasite is transmitted to humans from freshwater snails. Worms dwell in blood vessels and release eggs that become embedded in the bladder wall to elicit chronic immune-mediated disease and induce squamous cell carcinoma. Here we sequenced the 385-Mb genome of S. haematobium using Illumina-based technology at 74-fold coverage and compared it to sequences from related parasites. We included genome annotation based on function, gene ontology, networking and pathway mapping. This genome now provides an unprecedented resource for many fundamental research areas and shows great promise for the design of new disease interventions.
The genomic applications in practice and prevention network.
Khoury, Muin J; Feero, W Gregory; Reyes, Michele; Citrin, Toby; Freedman, Andrew; Leonard, Debra; Burke, Wylie; Coates, Ralph; Croyle, Robert T; Edwards, Karen; Kardia, Sharon; McBride, Colleen; Manolio, Teri; Randhawa, Gurvaneet; Rasooly, Rebekah; St Pierre, Jeannette; Terry, Sharon
2009-07-01
The authors describe the rationale and initial development of a new collaborative initiative, the Genomic Applications in Practice and Prevention Network. The network convened by the Centers for Disease Control and Prevention and the National Institutes of Health includes multiple stakeholders from academia, government, health care, public health, industry and consumers. The premise of Genomic Applications in Practice and Prevention Network is that there is an unaddressed chasm between gene discoveries and demonstration of their clinical validity and utility. This chasm is due to the lack of readily accessible information about the utility of most genomic applications and the lack of necessary knowledge by consumers and providers to implement what is known. The mission of Genomic Applications in Practice and Prevention Network is to accelerate and streamline the effective integration of validated genomic knowledge into the practice of medicine and public health, by empowering and sponsoring research, evaluating research findings, and disseminating high quality information on candidate genomic applications in practice and prevention. Genomic Applications in Practice and Prevention Network will develop a process that links ongoing collection of information on candidate genomic applications to four crucial domains: (1) knowledge synthesis and dissemination for new and existing technologies, and the identification of knowledge gaps, (2) a robust evidence-based recommendation development process, (3) translation research to evaluate validity, utility and impact in the real world and how to disseminate and implement recommended genomic applications, and (4) programs to enhance practice, education, and surveillance.
From genomics to chemical genomics: new developments in KEGG
Kanehisa, Minoru; Goto, Susumu; Hattori, Masahiro; Aoki-Kinoshita, Kiyoko F.; Itoh, Masumi; Kawashima, Shuichi; Katayama, Toshiaki; Araki, Michihiro; Hirakawa, Mika
2006-01-01
The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource () provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps. PMID:16381885
Yang, Q; Siganos, G; Faloutsos, M; Lonardi, S
2006-01-01
Recent research efforts have made available genome-wide, high-throughput protein-protein interaction (PPI) maps for several model organisms. This has enabled the systematic analysis of PPI networks, which has become one of the primary challenges for the system biology community. In this study, we attempt to understand better the topological structure of PPI networks by comparing them against man-made communication networks, and more specifically, the Internet. Our comparative study is based on a comprehensive set of graph metrics. Our results exhibit an interesting dichotomy. On the one hand, both networks share several macroscopic properties such as scale-free and small-world properties. On the other hand, the two networks exhibit significant topological differences, such as the cliqueishness of the highest degree nodes. We attribute these differences to the distinct design principles and constraints that both networks are assumed to satisfy. We speculate that the evolutionary constraints that favor the survivability and diversification are behind the building process of PPI networks, whereas the leading force in shaping the Internet topology is a decentralized optimization process geared towards efficient node communication.
NASA Astrophysics Data System (ADS)
Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng
2016-01-01
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Monteiro, Pedro Tiago; Pais, Pedro; Costa, Catarina; Manna, Sauvagya; Sá-Correia, Isabel; Teixeira, Miguel Cacho
2017-01-04
We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Upon data retrieval from hundreds of publications, followed by curation, the database currently includes 28 000 unique documented regulatory associations between transcription factors (TF) and target genes and 107 DNA binding sites, considering 134 TFs in both species. Following the structure used for the YEASTRACT database, PathoYeastract makes available bioinformatics tools that enable the user to exploit the existing information to predict the TFs involved in the regulation of a gene or genome-wide transcriptional response, while ranking those TFs in order of their relative importance. Each search can be filtered based on the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. Promoter analysis tools and interactive visualization tools for the representation of TF regulatory networks are also provided. The PathoYeastract database further provides simple tools for the prediction of gene and genomic regulation based on orthologous regulatory associations described for other yeast species, a comparative genomics setup for the study of cross-species evolution of regulatory networks. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng
2016-01-22
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
NIBBS-search for fast and accurate prediction of phenotype-biased metabolic systems.
Schmidt, Matthew C; Rocha, Andrea M; Padmanabhan, Kanchana; Shpanskaya, Yekaterina; Banfield, Jill; Scott, Kathleen; Mihelcic, James R; Samatova, Nagiza F
2012-01-01
Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organism's genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS.
NIBBS-Search for Fast and Accurate Prediction of Phenotype-Biased Metabolic Systems
Padmanabhan, Kanchana; Shpanskaya, Yekaterina; Banfield, Jill; Scott, Kathleen; Mihelcic, James R.; Samatova, Nagiza F.
2012-01-01
Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organism's genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS. PMID:22589706
Reticulate classification of mosaic microbial genomes using NeAT website.
Lima-Mendez, Gipsi
2012-01-01
The tree of life is the classical representation of the evolutionary relationships between existent species. A tree is appropriate to display the divergence of species through mutation, i.e., by vertical descent. However, lateral gene transfer (LGT) is excluded from such representations. When LGT contribution to genome evolution cannot be neglected (e.g., for prokaryotes and mobile genetic elements), the tree becomes misleading. Networks appear as an intuitive way to represent both vertical and horizontal relationships, while overlapping groups within such graphs are more suitable for their classification. Here, we describe a method to represent both vertical and horizontal relationships. We start with a set of genomes whose coded proteins have been grouped into families based on sequence similarity. Next, all pairs of genomes are compared, counting the number of proteins classified into the same family. From this comparison, we derive a weighted graph where genomes with a significant number of similar proteins are linked. Finally, we apply a two-step clustering of this graph to produce a classification where nodes can be assigned to multiple clusters. The procedure can be performed using the Network Analysis Tools (NeAT) website.
Marcelletti, Simone; Scortichini, Marco
2016-10-01
A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.
Li, Cheng-Wei; Chen, Bor-Sen
2016-01-01
Epigenetic and microRNA (miRNA) regulation are associated with carcinogenesis and the development of cancer. By using the available omics data, including those from next-generation sequencing (NGS), genome-wide methylation profiling, candidate integrated genetic and epigenetic network (IGEN) analysis, and drug response genome-wide microarray analysis, we constructed an IGEN system based on three coupling regression models that characterize protein-protein interaction networks (PPINs), gene regulatory networks (GRNs), miRNA regulatory networks (MRNs), and epigenetic regulatory networks (ERNs). By applying system identification method and principal genome-wide network projection (PGNP) to IGEN analysis, we identified the core network biomarkers to investigate bladder carcinogenic mechanisms and design multiple drug combinations for treating bladder cancer with minimal side-effects. The progression of DNA repair and cell proliferation in stage 1 bladder cancer ultimately results not only in the derepression of miR-200a and miR-200b but also in the regulation of the TNF pathway to metastasis-related genes or proteins, cell proliferation, and DNA repair in stage 4 bladder cancer. We designed a multiple drug combination comprising gefitinib, estradiol, yohimbine, and fulvestrant for treating stage 1 bladder cancer with minimal side-effects, and another multiple drug combination comprising gefitinib, estradiol, chlorpromazine, and LY294002 for treating stage 4 bladder cancer with minimal side-effects.
Krumholz, Elias W.; Libourel, Igor G. L.
2015-01-01
Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable. PMID:26041773
Genome-wide protein-protein interactions and protein function exploration in cyanobacteria
Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu
2015-01-01
Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and “interologs” in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033
Systems biology of the structural proteome.
Brunk, Elizabeth; Mih, Nathan; Monk, Jonathan; Zhang, Zhen; O'Brien, Edward J; Bliven, Spencer E; Chen, Ke; Chang, Roger L; Bourne, Philip E; Palsson, Bernhard O
2016-03-11
The success of genome-scale models (GEMs) can be attributed to the high-quality, bottom-up reconstructions of metabolic, protein synthesis, and transcriptional regulatory networks on an organism-specific basis. Such reconstructions are biochemically, genetically, and genomically structured knowledge bases that can be converted into a mathematical format to enable a myriad of computational biological studies. In recent years, genome-scale reconstructions have been extended to include protein structural information, which has opened up new vistas in systems biology research and empowered applications in structural systems biology and systems pharmacology. Here, we present the generation, application, and dissemination of genome-scale models with protein structures (GEM-PRO) for Escherichia coli and Thermotoga maritima. We show the utility of integrating molecular scale analyses with systems biology approaches by discussing several comparative analyses on the temperature dependence of growth, the distribution of protein fold families, substrate specificity, and characteristic features of whole cell proteomes. Finally, to aid in the grand challenge of big data to knowledge, we provide several explicit tutorials of how protein-related information can be linked to genome-scale models in a public GitHub repository ( https://github.com/SBRG/GEMPro/tree/master/GEMPro_recon/). Translating genome-scale, protein-related information to structured data in the format of a GEM provides a direct mapping of gene to gene-product to protein structure to biochemical reaction to network states to phenotypic function. Integration of molecular-level details of individual proteins, such as their physical, chemical, and structural properties, further expands the description of biochemical network-level properties, and can ultimately influence how to model and predict whole cell phenotypes as well as perform comparative systems biology approaches to study differences between organisms. GEM-PRO offers insight into the physical embodiment of an organism's genotype, and its use in this comparative framework enables exploration of adaptive strategies for these organisms, opening the door to many new lines of research. With these provided tools, tutorials, and background, the reader will be in a position to run GEM-PRO for their own purposes.
Hartzler, Andrea; McCarty, Catherine A.; Rasmussen, Luke V.; Williams, Marc S.; Brilliant, Murray; Bowton, Erica A.; Clayton, Ellen Wright; Faucett, William A.; Ferryman, Kadija; Field, Julie R.; Fullerton, Stephanie M.; Horowitz, Carol R.; Koenig, Barbara A.; McCormick, Jennifer B.; Ralston, James D.; Sanderson, Saskia C.; Smith, Maureen E.; Trinidad, Susan Brown
2014-01-01
Integrating genomic information into clinical care and the electronic health record can facilitate personalized medicine through genetically guided clinical decision support. Stakeholder involvement is critical to the success of these implementation efforts. Prior work on implementation of clinical information systems provides broad guidance to inform effective engagement strategies. We add to this evidence-based recommendations that are specific to issues at the intersection of genomics and the electronic health record. We describe stakeholder engagement strategies employed by the Electronic Medical Records and Genomics Network, a national consortium of US research institutions funded by the National Human Genome Research Institute to develop, disseminate, and apply approaches that combine genomic and electronic health record data. Through select examples drawn from sites of the Electronic Medical Records and Genomics Network, we illustrate a continuum of engagement strategies to inform genomic integration into commercial and homegrown electronic health records across a range of health-care settings. We frame engagement as activities to consult, involve, and partner with key stakeholder groups throughout specific phases of health information technology implementation. Our aim is to provide insights into engagement strategies to guide genomic integration based on our unique network experiences and lessons learned within the broader context of implementation research in biomedical informatics. On the basis of our collective experience, we describe key stakeholder practices, challenges, and considerations for successful genomic integration to support personalized medicine. PMID:24030437
Network-Based Identification and Prioritization of Key Regulators of Coronary Artery Disease Loci
Zhao, Yuqi; Chen, Jing; Freudenberg, Johannes M.; Meng, Qingying; Rajpal, Deepak K.; Yang, Xia
2017-01-01
Objective Recent genome-wide association studies of coronary artery disease (CAD) have revealed 58 genome-wide significant and 148 suggestive genetic loci. However, the molecular mechanisms through which they contribute to CAD and the clinical implications of these findings remain largely unknown. We aim to retrieve gene subnetworks of the 206 CAD loci and identify and prioritize candidate regulators to better understand the biological mechanisms underlying the genetic associations. Approach and Results We devised a new integrative genomics approach that incorporated (1) candidate genes from the top CAD loci, (2) the complete genetic association results from the 1000 genomes-based CAD genome-wide association studies from the Coronary Artery Disease Genome Wide Replication and Meta-Analysis Plus the Coronary Artery Disease consortium, (3) tissue-specific gene regulatory networks that depict the potential relationship and interactions between genes, and (4) tissue-specific gene expression patterns between CAD patients and controls. The networks and top-ranked regulators according to these data-driven criteria were further queried against literature, experimental evidence, and drug information to evaluate their disease relevance and potential as drug targets. Our analysis uncovered several potential novel regulators of CAD such as LUM and STAT3, which possess properties suitable as drug targets. We also revealed molecular relations and potential mechanisms through which the top CAD loci operate. Furthermore, we found that multiple CAD-relevant biological processes such as extracellular matrix, inflammatory and immune pathways, complement and coagulation cascades, and lipid metabolism interact in the CAD networks. Conclusions Our data-driven integrative genomics framework unraveled tissue-specific relations among the candidate genes of the CAD genome-wide association studies loci and prioritized novel network regulatory genes orchestrating biological processes relevant to CAD. PMID:26966275
Vivek-Ananth, R P; Samal, Areejit
2016-09-01
A major goal of systems biology is to build predictive computational models of cellular metabolism. Availability of complete genome sequences and wealth of legacy biochemical information has led to the reconstruction of genome-scale metabolic networks in the last 15 years for several organisms across the three domains of life. Due to paucity of information on kinetic parameters associated with metabolic reactions, the constraint-based modelling approach, flux balance analysis (FBA), has proved to be a vital alternative to investigate the capabilities of reconstructed metabolic networks. In parallel, advent of high-throughput technologies has led to the generation of massive amounts of omics data on transcriptional regulation comprising mRNA transcript levels and genome-wide binding profile of transcriptional regulators. A frontier area in metabolic systems biology has been the development of methods to integrate the available transcriptional regulatory information into constraint-based models of reconstructed metabolic networks in order to increase the predictive capabilities of computational models and understand the regulation of cellular metabolism. Here, we review the existing methods to integrate transcriptional regulatory information into constraint-based models of metabolic networks. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The post-genomic era of biological network alignment.
Faisal, Fazle E; Meng, Lei; Crawford, Joseph; Milenković, Tijana
2015-12-01
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Network analysis of transcriptomics expands regulatory landscapes in Synechococcus sp. PCC 7002
DOE Office of Scientific and Technical Information (OSTI.GOV)
McClure, Ryan S.; Overall, Christopher C.; McDermott, Jason E.
Cyanobacterial regulation of gene expression must contend with a genome organization that lacks apparent functional context, as the majority of cellular processes and metabolic pathways are encoded by genes found at disparate locations across the genome. In addition, the fact that coordinated regulation of cyanobacterial cellular machinery takes place with significantly fewer transcription factors, compared to other Eubacteria, suggests the involvement of post-transcriptional mechanisms and regulatory adaptations which are not fully understood. Global transcript abundance from model cyanobacterium Synechococcus sp. PCC 7002 grown under 42 different conditions was analyzed using context-likelihood of relatedness. The resulting 903-gene network, which was organizedmore » into 11 modules, not only allowed classification of cyanobacterial responses to specific environmental variables but provided insight into the transcriptional network topology and led to the expansion of predicted regulons. When used in conjunction with genome sequence, the global transcript abundance allowed identification of putative post-transcriptional changes in expression as well as novel potential targets of both DNA binding proteins and asRNA regulators. The results offer a new perspective into the multi-level regulation that governs cellular adaptations of fast-growing physiologically robust cyanobacterium Synechococcus sp. PCC 7002 to changing environmental variables. It also extends a methodological knowledge-based framework for studying multi-scale regulatory mechanisms that operate in cyanobacteria. Finally, it provides valuable context for integrating systems-level data to enhance evidence-driven genomic annotation, especially in organisms where traditional context analyses cannot be implemented due to lack of operon-based functional organization.« less
Tian, Xinyu; Wang, Xuefeng; Chen, Jun
2014-01-01
Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.
NASA Astrophysics Data System (ADS)
Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra
2016-05-01
A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.
An efficient graph theory based method to identify every minimal reaction set in a metabolic network
2014-01-01
Background Development of cells with minimal metabolic functionality is gaining importance due to their efficiency in producing chemicals and fuels. Existing computational methods to identify minimal reaction sets in metabolic networks are computationally expensive. Further, they identify only one of the several possible minimal reaction sets. Results In this paper, we propose an efficient graph theory based recursive optimization approach to identify all minimal reaction sets. Graph theoretical insights offer systematic methods to not only reduce the number of variables in math programming and increase its computational efficiency, but also provide efficient ways to find multiple optimal solutions. The efficacy of the proposed approach is demonstrated using case studies from Escherichia coli and Saccharomyces cerevisiae. In case study 1, the proposed method identified three minimal reaction sets each containing 38 reactions in Escherichia coli central metabolic network with 77 reactions. Analysis of these three minimal reaction sets revealed that one of them is more suitable for developing minimal metabolism cell compared to other two due to practically achievable internal flux distribution. In case study 2, the proposed method identified 256 minimal reaction sets from the Saccharomyces cerevisiae genome scale metabolic network with 620 reactions. The proposed method required only 4.5 hours to identify all the 256 minimal reaction sets and has shown a significant reduction (approximately 80%) in the solution time when compared to the existing methods for finding minimal reaction set. Conclusions Identification of all minimal reactions sets in metabolic networks is essential since different minimal reaction sets have different properties that effect the bioprocess development. The proposed method correctly identified all minimal reaction sets in a both the case studies. The proposed method is computationally efficient compared to other methods for finding minimal reaction sets and useful to employ with genome-scale metabolic networks. PMID:24594118
Social networks to biological networks: systems biology of Mycobacterium tuberculosis.
Vashisht, Rohit; Bhardwaj, Anshu; Osdd Consortium; Brahmachari, Samir K
2013-07-01
Contextualizing relevant information to construct a network that represents a given biological process presents a fundamental challenge in the network science of biology. The quality of network for the organism of interest is critically dependent on the extent of functional annotation of its genome. Mostly the automated annotation pipelines do not account for unstructured information present in volumes of literature and hence large fraction of genome remains poorly annotated. However, if used, this information could substantially enhance the functional annotation of a genome, aiding the development of a more comprehensive network. Mining unstructured information buried in volumes of literature often requires manual intervention to a great extent and thus becomes a bottleneck for most of the automated pipelines. In this review, we discuss the potential of scientific social networking as a solution for systematic manual mining of data. Focusing on Mycobacterium tuberculosis, as a case study, we discuss our open innovative approach for the functional annotation of its genome. Furthermore, we highlight the strength of such collated structured data in the context of drug target prediction based on systems level analysis of pathogen.
Ma, Hong-Wu; Zhao, Xue-Ming; Yuan, Ying-Jin; Zeng, An-Ping
2004-08-12
Metabolic networks are organized in a modular, hierarchical manner. Methods for a rational decomposition of the metabolic network into relatively independent functional subsets are essential to better understand the modularity and organization principle of a large-scale, genome-wide network. Network decomposition is also necessary for functional analysis of metabolism by pathway analysis methods that are often hampered by the problem of combinatorial explosion due to the complexity of metabolic network. Decomposition methods proposed in literature are mainly based on the connection degree of metabolites. To obtain a more reasonable decomposition, the global connectivity structure of metabolic networks should be taken into account. In this work, we use a reaction graph representation of a metabolic network for the identification of its global connectivity structure and for decomposition. A bow-tie connectivity structure similar to that previously discovered for metabolite graph is found also to exist in the reaction graph. Based on this bow-tie structure, a new decomposition method is proposed, which uses a distance definition derived from the path length between two reactions. An hierarchical classification tree is first constructed from the distance matrix among the reactions in the giant strong component of the bow-tie structure. These reactions are then grouped into different subsets based on the hierarchical tree. Reactions in the IN and OUT subsets of the bow-tie structure are subsequently placed in the corresponding subsets according to a 'majority rule'. Compared with the decomposition methods proposed in literature, ours is based on combined properties of the global network structure and local reaction connectivity rather than, primarily, on the connection degree of metabolites. The method is applied to decompose the metabolic network of Escherichia coli. Eleven subsets are obtained. More detailed investigations of the subsets show that reactions in the same subset are really functionally related. The rational decomposition of metabolic networks, and subsequent studies of the subsets, make it more amenable to understand the inherent organization and functionality of metabolic networks at the modular level. http://genome.gbf.de/bioinformatics/
USDA-ARS?s Scientific Manuscript database
Functional annotations of large plant genome projects mostly provide information on gene function and gene families based on the presence of protein domains and gene homology, but not necessarily in association with gene expression or metabolic and regulatory networks. These additional annotations a...
Emergent adaptive behaviour of GRN-controlled simulated robots in a changing environment.
Yao, Yao; Storme, Veronique; Marchal, Kathleen; Van de Peer, Yves
2016-01-01
We developed a bio-inspired robot controller combining an artificial genome with an agent-based control system. The genome encodes a gene regulatory network (GRN) that is switched on by environmental cues and, following the rules of transcriptional regulation, provides output signals to actuators. Whereas the genome represents the full encoding of the transcriptional network, the agent-based system mimics the active regulatory network and signal transduction system also present in naturally occurring biological systems. Using such a design that separates the static from the conditionally active part of the gene regulatory network contributes to a better general adaptive behaviour. Here, we have explored the potential of our platform with respect to the evolution of adaptive behaviour, such as preying when food becomes scarce, in a complex and changing environment and show through simulations of swarm robots in an A-life environment that evolution of collective behaviour likely can be attributed to bio-inspired evolutionary processes acting at different levels, from the gene and the genome to the individual robot and robot population.
Emergent adaptive behaviour of GRN-controlled simulated robots in a changing environment
Yao, Yao; Storme, Veronique; Marchal, Kathleen
2016-01-01
We developed a bio-inspired robot controller combining an artificial genome with an agent-based control system. The genome encodes a gene regulatory network (GRN) that is switched on by environmental cues and, following the rules of transcriptional regulation, provides output signals to actuators. Whereas the genome represents the full encoding of the transcriptional network, the agent-based system mimics the active regulatory network and signal transduction system also present in naturally occurring biological systems. Using such a design that separates the static from the conditionally active part of the gene regulatory network contributes to a better general adaptive behaviour. Here, we have explored the potential of our platform with respect to the evolution of adaptive behaviour, such as preying when food becomes scarce, in a complex and changing environment and show through simulations of swarm robots in an A-life environment that evolution of collective behaviour likely can be attributed to bio-inspired evolutionary processes acting at different levels, from the gene and the genome to the individual robot and robot population. PMID:28028477
Dutta, B; Pusztai, L; Qi, Y; André, F; Lazar, V; Bianchini, G; Ueno, N; Agarwal, R; Wang, B; Shiang, C Y; Hortobagyi, G N; Mills, G B; Symmans, W F; Balázsi, G
2012-01-01
Background: The rapid collection of diverse genome-scale data raises the urgent need to integrate and utilise these resources for biological discovery or biomedical applications. For example, diverse transcriptomic and gene copy number variation data are currently collected for various cancers, but relatively few current methods are capable to utilise the emerging information. Methods: We developed and tested a data-integration method to identify gene networks that drive the biology of breast cancer clinical subtypes. The method simultaneously overlays gene expression and gene copy number data on protein–protein interaction, transcriptional-regulatory and signalling networks by identifying coincident genomic and transcriptional disturbances in local network neighborhoods. Results: We identified distinct driver-networks for each of the three common clinical breast cancer subtypes: oestrogen receptor (ER)+, human epidermal growth factor receptor 2 (HER2)+, and triple receptor-negative breast cancers (TNBC) from patient and cell line data sets. Driver-networks inferred from independent datasets were significantly reproducible. We also confirmed the functional relevance of a subset of randomly selected driver-network members for TNBC in gene knockdown experiments in vitro. We found that TNBC driver-network members genes have increased functional specificity to TNBC cell lines and higher functional sensitivity compared with genes selected by differential expression alone. Conclusion: Clinical subtype-specific driver-networks identified through data integration are reproducible and functionally important. PMID:22343619
Chang, Xiao; Wang, Zhuo; Hao, Pei; Li, Yuan-Yuan; Li, Yi-Xue
2010-06-01
The endosymbiotic theory proposed that mitochondrial genomes are derived from an alpha-proteobacterium-like endosymbiont, which was concluded from sequence analysis. We rebuilt the metabolic networks of mitochondria and 22 relative species, and studied the evolution of mitochondrial metabolism at the level of enzyme content and network topology. Our phylogenetic results based on network alignment and motif identification supported the endosymbiotic theory from the point of view of systems biology for the first time. It was found that the mitochondrial metabolic network were much more compact than the relative species, probably related to the higher efficiency of oxidative phosphorylation of the specialized organelle, and the network is highly clustered around the TCA cycle. Moreover, the mitochondrial metabolic network exhibited high functional specificity to the modules. This work provided insight to the understanding of mitochondria evolution, and the organization principle of mitochondrial metabolic network at the network level. Copyright 2010 Elsevier Inc. All rights reserved.
Genome-wide inference of regulatory networks in Streptomyces coelicolor.
Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou
2010-10-18
The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.
Krumholz, Elias W; Libourel, Igor G L
2015-07-31
Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Xu, Zixiang; Zheng, Ping; Sun, Jibin; Ma, Yanhe
2013-01-01
Gene knockout has been used as a common strategy to improve microbial strains for producing chemicals. Several algorithms are available to predict the target reactions to be deleted. Most of them apply mixed integer bi-level linear programming (MIBLP) based on metabolic networks, and use duality theory to transform bi-level optimization problem of large-scale MIBLP to single-level programming. However, the validity of the transformation was not proved. Solution of MIBLP depends on the structure of inner problem. If the inner problem is continuous, Karush-Kuhn-Tucker (KKT) method can be used to reformulate the MIBLP to a single-level one. We adopt KKT technique in our algorithm ReacKnock to attack the intractable problem of the solution of MIBLP, demonstrated with the genome-scale metabolic network model of E. coli for producing various chemicals such as succinate, ethanol, threonine and etc. Compared to the previous methods, our algorithm is fast, stable and reliable to find the optimal solutions for all the chemical products tested, and able to provide all the alternative deletion strategies which lead to the same industrial objective. PMID:24348984
Hsiao, Tzu-Hung; Chiu, Yu-Chiao; Hsu, Pei-Yin; Lu, Tzu-Pin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Huang, Tim H.-M.; Chuang, Eric Y.; Chen, Yidong
2016-01-01
Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC. PMID:26972162
Cytoscape: the network visualization tool for GenomeSpace workflows.
Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P
2014-01-01
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013.
Cytoscape: the network visualization tool for GenomeSpace workflows
Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P.
2014-01-01
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537
GTA: a game theoretic approach to identifying cancer subnetwork markers.
Farahmand, S; Goliaei, S; Ansari-Pour, N; Razaghi-Moghadam, Z
2016-03-01
The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years. A subset of these studies attempt to analyze genome-wide expression profiles to identify markers with high reliability and reusability across independent whole-transcriptome microarray datasets. Therefore, the functional relationships of genes are integrated with their expression data. However, for a more accurate representation of the functional relationships among genes, utilization of the protein-protein interaction network (PPIN) seems to be necessary. Herein, a novel game theoretic approach (GTA) is proposed for the identification of cancer subnetwork markers by integrating genome-wide expression profiles and PPIN. The GTA method was applied to three distinct whole-transcriptome breast cancer datasets to identify the subnetwork markers associated with metastasis. To evaluate the performance of our approach, the identified subnetwork markers were compared with gene-based, pathway-based and network-based markers. We show that GTA is not only capable of identifying robust metastatic markers, it also provides a higher classification performance. In addition, based on these GTA-based subnetworks, we identified a new bonafide candidate gene for breast cancer susceptibility.
Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network.
Al-Harazi, Olfat; Al Insaif, Sadiq; Al-Ajlan, Monirah A; Kaya, Namik; Dzimiri, Nduna; Colak, Dilek
2016-06-20
A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field. Copyright © 2015 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
Investigation of a protein complex network
NASA Astrophysics Data System (ADS)
Mashaghi, A. R.; Ramezanpour, A.; Karimipour, V.
2004-09-01
The budding yeast Saccharomyces cerevisiae is the first eukaryote whose genome has been completely sequenced. It is also the first eukaryotic cell whose proteome (the set of all proteins) and interactome (the network of all mutual interactions between proteins) has been analyzed. In this paper we study the structure of the yeast protein complex network in which weighted edges between complexes represent the number of shared proteins. It is found that the network of protein complexes is a small world network with scale free behavior for many of its distributions. However we find that there are no strong correlations between the weights and degrees of neighboring complexes. To reveal non-random features of the network we also compare it with a null model in which the complexes randomly select their proteins. Finally we propose a simple evolutionary model based on duplication and divergence of proteins.
Mosaic Graphs and Comparative Genomics in Phage Communities
Belcaid, Mahdi; Bergeron, Anne
2010-01-01
Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413
Towards Breaking the Histone Code – Bayesian Graphical Models for Histone Modifications
Mitra, Riten; Müller, Peter; Liang, Shoudan; Xu, Yanxun; Ji, Yuan
2013-01-01
Background Histones are proteins that wrap DNA around in small spherical structures called nucleosomes. Histone modifications (HMs) refer to the post-translational modifications to the histone tails. At a particular genomic locus, each of these HMs can either be present or absent, and the combinatory patterns of the presence or absence of multiple HMs, or the ‘histone codes,’ are believed to co-regulate important biological processes. We aim to use raw data on HM markers at different genomic loci to (1) decode the complex biological network of HMs in a single region and (2) demonstrate how the HM networks differ in different regulatory regions. We suggest that these differences in network attributes form a significant link between histones and genomic functions. Methods and Results We develop a powerful graphical model under Bayesian paradigm. Posterior inference is fully probabilistic, allowing us to compute the probabilities of distinct dependence patterns of the HMs using graphs. Furthermore, our model-based framework allows for easy but important extensions for inference on differential networks under various conditions, such as the different annotations of the genomic locations (e.g., promoters versus insulators). We applied these models to ChIP-Seq data based on CD4+ T lymphocytes. The results confirmed many existing findings and provided a unified tool to generate various promising hypotheses. Differential network analyses revealed new insights on co-regulation of HMs of transcriptional activities in different genomic regions. Conclusions The use of Bayesian graphical models and borrowing strength across different conditions provide high power to infer histone networks and their differences. PMID:23748248
Ding, Dewu; Sun, Xiao
2018-01-16
Shewanella oneidensis MR-1 can transfer electrons from the intracellular environment to the extracellular space of the cells to reduce the extracellular insoluble electron acceptors (Extracellular Electron Transfer, EET). Benefiting from this EET capability, Shewanella has been widely used in different areas, such as energy production, wastewater treatment, and bioremediation. Genome-wide proteomics data was used to determine the active proteins involved in activating the EET process. We identified 1012 proteins with decreased expression and 811 proteins with increased expression when the EET process changed from inactivation to activation. We then networked these proteins to construct the active protein networks, and identified the top 20 key active proteins by network centralization analysis, including metabolism- and energy-related proteins, signal and transcriptional regulatory proteins, translation-related proteins, and the EET-related proteins. We also constructed the integrated protein interaction and transcriptional regulatory networks for the active proteins, then found three exclusive active network motifs involved in activating the EET process-Bi-feedforward Loop, Regulatory Cascade with a Feedback, and Feedback with a Protein-Protein Interaction (PPI)-and identified the active proteins involved in these motifs. Both enrichment analysis and comparative analysis to the whole-genome data implicated the multiheme c -type cytochromes and multiple signal processing proteins involved in the process. Furthermore, the interactions of these motif-guided active proteins and the involved functional modules were discussed. Collectively, by using network-based methods, this work reported a proteome-wide search for the key active proteins that potentially activate the EET process.
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
Reverse engineering and analysis of large genome-scale gene networks
Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas
2013-01-01
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249
García-Alonso, Luz; Alonso, Roberto; Vidal, Enrique; Amadoz, Alicia; de María, Alejandro; Minguez, Pablo; Medina, Ignacio; Dopazo, Joaquín
2012-01-01
Genomic experiments (e.g. differential gene expression, single-nucleotide polymorphism association) typically produce ranked list of genes. We present a simple but powerful approach which uses protein–protein interaction data to detect sub-networks within such ranked lists of genes or proteins. We performed an exhaustive study of network parameters that allowed us concluding that the average number of components and the average number of nodes per component are the parameters that best discriminate between real and random networks. A novel aspect that increases the efficiency of this strategy in finding sub-networks is that, in addition to direct connections, also connections mediated by intermediate nodes are considered to build up the sub-networks. The possibility of using of such intermediate nodes makes this approach more robust to noise. It also overcomes some limitations intrinsic to experimental designs based on differential expression, in which some nodes are invariant across conditions. The proposed approach can also be used for candidate disease-gene prioritization. Here, we demonstrate the usefulness of the approach by means of several case examples that include a differential expression analysis in Fanconi Anemia, a genome-wide association study of bipolar disorder and a genome-scale study of essentiality in cancer genes. An efficient and easy-to-use web interface (available at http://www.babelomics.org) based on HTML5 technologies is also provided to run the algorithm and represent the network. PMID:22844098
2011-01-01
Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions. PMID:21668997
NASA Astrophysics Data System (ADS)
Hwang, Sohyun; Kim, Chan Yeong; Ji, Sun-Gou; Go, Junhyeok; Kim, Hanhae; Yang, Sunmo; Kim, Hye Jin; Cho, Ara; Yoon, Sang Sun; Lee, Insuk
2016-05-01
Pseudomonas aeruginosa is a Gram-negative bacterium of clinical significance. Although the genome of PAO1, a prototype strain of P. aeruginosa, has been extensively studied, approximately one-third of the functional genome remains unknown. With the emergence of antibiotic-resistant strains of P. aeruginosa, there is an urgent need to develop novel antibiotic and anti-virulence strategies, which may be facilitated by an approach that explores P. aeruginosa gene function in systems-level models. Here, we present a genome-wide functional network of P. aeruginosa genes, PseudomonasNet, which covers 98% of the coding genome, and a companion web server to generate functional hypotheses using various network-search algorithms. We demonstrate that PseudomonasNet-assisted predictions can effectively identify novel genes involved in virulence and antibiotic resistance. Moreover, an antibiotic-resistance network based on PseudomonasNet reveals that P. aeruginosa has common modular genetic organisations that confer increased or decreased resistance to diverse antibiotics, which accounts for the pervasiveness of cross-resistance across multiple drugs. The same network also suggests that P. aeruginosa has developed mechanism of trade-off in resistance across drugs by altering genetic interactions. Taken together, these results clearly demonstrate the usefulness of a genome-scale functional network to investigate pathogenic systems in P. aeruginosa.
Model-based redesign of global transcription regulation
Carrera, Javier; Rodrigo, Guillermo; Jaramillo, Alfonso
2009-01-01
Synthetic biology aims to the design or redesign of biological systems. In particular, one possible goal could be the rewiring of the transcription regulation network by exchanging the endogenous promoters. To achieve this objective, we have adapted current methods to the inference of a model based on ordinary differential equations that is able to predict the network response after a major change in its topology. Our procedure utilizes microarray data for training. We have experimentally validated our inferred global regulatory model in Escherichia coli by predicting transcriptomic profiles under new perturbations. We have also tested our methodology in silico by providing accurate predictions of the underlying networks from expression data generated with artificial genomes. In addition, we have shown the predictive power of our methodology by obtaining the gene profile in experimental redesigns of the E. coli genome, where rewiring the transcriptional network by means of knockouts of master regulators or by upregulating transcription factors controlled by different promoters. Our approach is compatible with most network inference methods, allowing to explore computationally future genome-wide redesign experiments in synthetic biology. PMID:19188257
WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Yi, Ming; Horton, Jay D; Cohen, Jonathan C; Hobbs, Helen H; Stephens, Robert M
2006-01-01
Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at . PMID:16423281
Guzzi, Pietro Hiram; Milenkovic, Tijana
2018-05-01
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
2011-01-01
Background Green plant leaves have always fascinated biologists as hosts for photosynthesis and providers of basic energy to many food webs. Today, comprehensive databases of gene expression data enable us to apply increasingly more advanced computational methods for reverse-engineering the regulatory network of leaves, and to begin to understand the gene interactions underlying complex emergent properties related to stress-response and development. These new systems biology methods are now also being applied to organisms such as Populus, a woody perennial tree, in order to understand the specific characteristics of these species. Results We present a systems biology model of the regulatory network of Populus leaves. The network is reverse-engineered from promoter information and expression profiles of leaf-specific genes measured over a large set of conditions related to stress and developmental. The network model incorporates interactions between regulators, such as synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms, and is therefore able to identify new regulators of leaf development not found by traditional genomics methods based on pair-wise expression similarity. The approach is shown to explain available gene function information and to provide robust prediction of expression levels in new data. We also use the predictive capability of the model to identify condition-specific regulation as well as conserved regulation between Populus and Arabidopsis. Conclusions We outline a computationally inferred model of the regulatory network of Populus leaves, and show how treating genes as interacting, rather than individual, entities identifies new regulators compared to traditional genomics analysis. Although systems biology models should be used with care considering the complexity of regulatory programs and the limitations of current genomics data, methods describing interactions can provide hypotheses about the underlying cause of emergent properties and are needed if we are to identify target genes other than those constituting the "low hanging fruit" of genomic analysis. PMID:21232107
Genome composition and phylogeny of microbes predict their co-occurrence in the environment
2017-01-01
The genomic information of microbes is a major determinant of their phenotypic properties, yet it is largely unknown to what extent ecological associations between different species can be explained by their genome composition. To bridge this gap, this study introduces two new genome-wide pairwise measures of microbe-microbe interaction. The first (genome content similarity index) quantifies similarity in genome composition between two microbes, while the second (microbe-microbe functional association index) summarizes the topology of a protein functional association network built for a given pair of microbes and quantifies the fraction of network edges crossing organismal boundaries. These new indices are then used to predict co-occurrence between reference genomes from two 16S-based ecological datasets, accounting for phylogenetic relatedness of the taxa. Phylogenetic relatedness was found to be a strong predictor of ecological associations between microbes which explains about 10% of variance in co-occurrence data, but genome composition was found to be a strong predictor as well, it explains up to 4% the variance in co-occurrence when all genomic-based indices are used in combination, even after accounting for evolutionary relationships between the species. On their own, the metrics proposed here explain a larger proportion of variance than previously reported more complex methods that rely on metabolic network comparisons. In summary, results of this study indicate that microbial genomes do indeed contain detectable signal of organismal ecology, and the methods described in the paper can be used to improve mechanistic understanding of microbe-microbe interactions. PMID:28152007
When is hub gene selection better than standard meta-analysis?
Langfelder, Peter; Mischel, Paul S; Horvath, Steve
2013-01-01
Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.
An integrated approach to reconstructing genome-scale transcriptional regulatory networks
Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.; ...
2015-02-27
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making themmore » highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.« less
Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.
Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak
2006-06-06
To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.
Genome-Scale Reconstruction of the Human Astrocyte Metabolic Network
Martín-Jiménez, Cynthia A.; Salazar-Barreto, Diego; Barreto, George E.; González, Janneth
2017-01-01
Astrocytes are the most abundant cells of the central nervous system; they have a predominant role in maintaining brain metabolism. In this sense, abnormal metabolic states have been found in different neuropathological diseases. Determination of metabolic states of astrocytes is difficult to model using current experimental approaches given the high number of reactions and metabolites present. Thus, genome-scale metabolic networks derived from transcriptomic data can be used as a framework to elucidate how astrocytes modulate human brain metabolic states during normal conditions and in neurodegenerative diseases. We performed a Genome-Scale Reconstruction of the Human Astrocyte Metabolic Network with the purpose of elucidating a significant portion of the metabolic map of the astrocyte. This is the first global high-quality, manually curated metabolic reconstruction network of a human astrocyte. It includes 5,007 metabolites and 5,659 reactions distributed among 8 cell compartments, (extracellular, cytoplasm, mitochondria, endoplasmic reticle, Golgi apparatus, lysosome, peroxisome and nucleus). Using the reconstructed network, the metabolic capabilities of human astrocytes were calculated and compared both in normal and ischemic conditions. We identified reactions activated in these two states, which can be useful for understanding the astrocytic pathways that are affected during brain disease. Additionally, we also showed that the obtained flux distributions in the model, are in accordance with literature-based findings. Up to date, this is the most complete representation of the human astrocyte in terms of inclusion of genes, proteins, reactions and metabolic pathways, being a useful guide for in-silico analysis of several metabolic behaviors of the astrocyte during normal and pathologic states. PMID:28243200
A protocol for generating a high-quality genome-scale metabolic reconstruction.
Thiele, Ines; Palsson, Bernhard Ø
2010-01-01
Network reconstructions are a common denominator in systems biology. Bottom-up metabolic network reconstructions have been developed over the last 10 years. These reconstructions represent structured knowledge bases that abstract pertinent information on the biochemical transformations taking place within specific target organisms. The conversion of a reconstruction into a mathematical format facilitates a myriad of computational biological studies, including evaluation of network content, hypothesis testing and generation, analysis of phenotypic characteristics and metabolic engineering. To date, genome-scale metabolic reconstructions for more than 30 organisms have been published and this number is expected to increase rapidly. However, these reconstructions differ in quality and coverage that may minimize their predictive potential and use as knowledge bases. Here we present a comprehensive protocol describing each step necessary to build a high-quality genome-scale metabolic reconstruction, as well as the common trials and tribulations. Therefore, this protocol provides a helpful manual for all stages of the reconstruction process.
A protocol for generating a high-quality genome-scale metabolic reconstruction
Thiele, Ines; Palsson, Bernhard Ø.
2011-01-01
Network reconstructions are a common denominator in systems biology. Bottom-up metabolic network reconstructions have developed over the past 10 years. These reconstructions represent structured knowledge-bases that abstract pertinent information on the biochemical transformations taking place within specific target organisms. The conversion of a reconstruction into a mathematical format facilitates myriad computational biological studies including evaluation of network content, hypothesis testing and generation, analysis of phenotypic characteristics, and metabolic engineering. To date, genome-scale metabolic reconstructions for more than 30 organisms have been published and this number is expected to increase rapidly. However, these reconstructions differ in quality and coverage that may minimize their predictive potential and use as knowledge-bases. Here, we present a comprehensive protocol describing each step necessary to build a high-quality genome-scale metabolic reconstruction as well as common trials and tribulations. Therefore, this protocol provides a helpful manual for all stages of the reconstruction process. PMID:20057383
Becker, Kerstin; Siegert, Sabine; Toliat, Mohammad Reza; Du, Juanjiangmeng; Casper, Ramona; Dolmans, Guido H.; Werker, Paul M.; Tinschert, Sigrid; Franke, Andre; Gieger, Christian; Strauch, Konstantin; Nothnagel, Michael; Nürnberg, Peter; Hennies, Hans Christian
2016-01-01
Dupuytren´s disease, a fibromatosis of the connective tissue in the palm, is a common complex disease with a strong genetic component. Up to date nine genetic loci have been found to be associated with the disease. Six of these loci contain genes that code for Wnt signalling proteins. In spite of this striking first insight into the genetic factors in Dupuytren´s disease, much of the inherited risk in Dupuytren´s disease still needs to be discovered. The already identified loci jointly explain ~1% of the heritability in this disease. To further elucidate the genetic basis of Dupuytren´s disease, we performed a genome-wide meta-analysis combining three genome-wide association study (GWAS) data sets, comprising 1,580 cases and 4,480 controls. We corroborated all nine previously identified loci, six of these with genome-wide significance (p-value < 5x10-8). In addition, we identified 14 new suggestive loci (p-value < 10−5). Intriguingly, several of these new loci contain genes associated with Wnt signalling and therefore represent excellent candidates for replication. Next, we compared whole-transcriptome data between patient- and control-derived tissue samples and found the Wnt/β-catenin pathway to be the top deregulated pathway in patient samples. We then conducted network and pathway analyses in order to identify protein networks that are enriched for genes highlighted in the GWAS meta-analysis and expression data sets. We found further evidence that the Wnt signalling pathways in conjunction with other pathways may play a critical role in Dupuytren´s disease. PMID:27467239
Genome-scale reconstruction of the metabolic network in Yersinia pestis CO92
NASA Astrophysics Data System (ADS)
Navid, Ali; Almaas, Eivind
2007-03-01
The gram-negative bacterium Yersinia pestis is the causative agent of bubonic plague. Using publicly available genomic, biochemical and physiological data, we have developed a constraint-based flux balance model of metabolism in the CO92 strain (biovar Orientalis) of this organism. The metabolic reactions were appropriately compartmentalized, and the model accounts for the exchange of metabolites, as well as the import of nutrients and export of waste products. We have characterized the metabolic capabilities and phenotypes of this organism, after comparing the model predictions with available experimental observations to evaluate accuracy and completeness. We have also begun preliminary studies into how cellular metabolism affects virulence.
Kumar, Avishek; Butler, Brandon M.; Kumar, Sudhir; Ozkan, S. Banu
2016-01-01
Summary Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. PMID:26684487
Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks*
Bandeira, Nuno
2016-01-01
Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software. PMID:27609420
eHive: an artificial intelligence workflow system for genomic analysis.
Severin, Jessica; Beal, Kathryn; Vilella, Albert J; Fitzgerald, Stephen; Schuster, Michael; Gordon, Leo; Ureta-Vidal, Abel; Flicek, Paul; Herrero, Javier
2010-05-11
The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja
2013-02-14
Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Measuring the Evolutionary Rewiring of Biological Networks
Shou, Chong; Bhardwaj, Nitin; Lam, Hugo Y. K.; Yan, Koon-Kiu; Kim, Philip M.; Snyder, Michael; Gerstein, Mark B.
2011-01-01
We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or “rewire”, at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of “commonplace” networks such as family trees, co-authorships and linux-kernel function dependencies. PMID:21253555
2011-01-01
Background Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. Results We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism’s metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. Conclusions After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis. PMID:22784571
Kim, Hyun Uk; Kim, Tae Yong; Lee, Sang Yup
2011-01-01
Genome-scale metabolic network models have contributed to elucidating biological phenomena, and predicting gene targets to engineer for biotechnological applications. With their increasing importance, their precise network characterization has also been crucial for better understanding of the cellular physiology. We herein introduce a framework for network modularization and Bayesian network analysis (FMB) to investigate organism's metabolism under perturbation. FMB reveals direction of influences among metabolic modules, in which reactions with similar or positively correlated flux variation patterns are clustered, in response to specific perturbation using metabolic flux data. With metabolic flux data calculated by constraints-based flux analysis under both control and perturbation conditions, FMB, in essence, reveals the effects of specific perturbations on the biological system through network modularization and Bayesian network analysis at metabolic modular level. As a demonstration, this framework was applied to the genetically perturbed Escherichia coli metabolism, which is a lpdA gene knockout mutant, using its genome-scale metabolic network model. After all, it provides alternative scenarios of metabolic flux distributions in response to the perturbation, which are complementary to the data obtained from conventionally available genome-wide high-throughput techniques or metabolic flux analysis.
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haraldsdóttir, Hulda S.; Cousins, Ben; Thiele, Ines
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. Wemore » apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.« less
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models
Haraldsdóttir, Hulda S.; Cousins, Ben; Thiele, Ines; ...
2017-01-31
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. Wemore » apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.« less
Integrating epigenomic data and 3D genomic structure with a new measure of chromatin assortativity.
Pancaldi, Vera; Carrillo-de-Santa-Pau, Enrique; Javierre, Biola Maria; Juan, David; Fraser, Peter; Spivakov, Mikhail; Valencia, Alfonso; Rico, Daniel
2016-07-08
Network analysis is a powerful way of modeling chromatin interactions. Assortativity is a network property used in social sciences to identify factors affecting how people establish social ties. We propose a new approach, using chromatin assortativity, to integrate the epigenomic landscape of a specific cell type with its chromatin interaction network and thus investigate which proteins or chromatin marks mediate genomic contacts. We use high-resolution promoter capture Hi-C and Hi-Cap data as well as ChIA-PET data from mouse embryonic stem cells to investigate promoter-centered chromatin interaction networks and calculate the presence of specific epigenomic features in the chromatin fragments constituting the nodes of the network. We estimate the association of these features with the topology of four chromatin interaction networks and identify features localized in connected areas of the network. Polycomb group proteins and associated histone marks are the features with the highest chromatin assortativity in promoter-centered networks. We then ask which features distinguish contacts amongst promoters from contacts between promoters and other genomic elements. We observe higher chromatin assortativity of the actively elongating form of RNA polymerase 2 (RNAPII) compared with inactive forms only in interactions between promoters and other elements. Contacts among promoters and between promoters and other elements have different characteristic epigenomic features. We identify a possible role for the elongating form of RNAPII in mediating interactions among promoters, enhancers, and transcribed gene bodies. Our approach facilitates the study of multiple genome-wide epigenomic profiles, considering network topology and allowing the comparison of chromatin interaction networks.
Lee, Sandra Soo-Jin; Vernez, Simone L.; Ormond, K.E.; Granovetter, Mark
2013-01-01
Little is known about how consumers of direct-to-consumer personal genetic services share personal genetic risk information. In an age of ubiquitous online networking and rapid development of social networking tools, understanding how consumers share personal genetic risk assessments is critical in the development of appropriate and effective policies. This exploratory study investigates how consumers share personal genetic information and attitudes towards social networking behaviors. Methods: Adult participants aged 23 to 72 years old who purchased direct-to-consumer genetic testing from a personal genomics company were administered a web-based survey regarding their sharing activities and social networking behaviors related to their personal genetic test results. Results: 80 participants completed the survey; of those, 45% shared results on Facebook and 50.9% reported meeting or reconnecting with more than 10 other individuals through the sharing of their personal genetic information. For help interpreting test results, 70.4% turned to Internet websites and online sources, compared to 22.7% who consulted their healthcare providers. Amongst participants, 51.8% reported that they believe the privacy of their personal genetic information would be breached in the future. Conclusion: Consumers actively utilize online social networking tools to help them share and interpret their personal genetic information. These findings suggest a need for careful consideration of policy recommendations in light of the current ambiguity of regulation and oversight of consumer initiated sharing activities. PMID:25562728
Lee, Sandra Soo-Jin; Vernez, Simone L; Ormond, K E; Granovetter, Mark
2013-10-14
Little is known about how consumers of direct-to-consumer personal genetic services share personal genetic risk information. In an age of ubiquitous online networking and rapid development of social networking tools, understanding how consumers share personal genetic risk assessments is critical in the development of appropriate and effective policies. This exploratory study investigates how consumers share personal genetic information and attitudes towards social networking behaviors. Adult participants aged 23 to 72 years old who purchased direct-to-consumer genetic testing from a personal genomics company were administered a web-based survey regarding their sharing activities and social networking behaviors related to their personal genetic test results. 80 participants completed the survey; of those, 45% shared results on Facebook and 50.9% reported meeting or reconnecting with more than 10 other individuals through the sharing of their personal genetic information. For help interpreting test results, 70.4% turned to Internet websites and online sources, compared to 22.7% who consulted their healthcare providers. Amongst participants, 51.8% reported that they believe the privacy of their personal genetic information would be breached in the future. Consumers actively utilize online social networking tools to help them share and interpret their personal genetic information. These findings suggest a need for careful consideration of policy recommendations in light of the current ambiguity of regulation and oversight of consumer initiated sharing activities.
Dynamics and control of state-dependent networks for probing genomic organization
Rajapakse, Indika; Groudine, Mark; Mesbahi, Mehran
2011-01-01
A state-dependent dynamic network is a collection of elements that interact through a network, whose geometry evolves as the state of the elements changes over time. The genome is an intriguing example of a state-dependent network, where chromosomal geometry directly relates to genomic activity, which in turn strongly correlates with geometry. Here we examine various aspects of a genomic state-dependent dynamic network. In particular, we elaborate on one of the important ramifications of viewing genomic networks as being state-dependent, namely, their controllability during processes of genomic reorganization such as in cell differentiation. PMID:21911407
Jothi, Raja; Balaji, S; Wuster, Arthur; Grochow, Joshua A; Gsponer, Jörg; Przytycka, Teresa M; Aravind, L; Babu, M Madan
2009-01-01
Although several studies have provided important insights into the general principles of biological networks, the link between network organization and the genome-scale dynamics of the underlying entities (genes, mRNAs, and proteins) and its role in systems behavior remain unclear. Here we show that transcription factor (TF) dynamics and regulatory network organization are tightly linked. By classifying TFs in the yeast regulatory network into three hierarchical layers (top, core, and bottom) and integrating diverse genome-scale datasets, we find that the TFs have static and dynamic properties that are similar within a layer and different across layers. At the protein level, the top-layer TFs are relatively abundant, long-lived, and noisy compared with the core- and bottom-layer TFs. Although variability in expression of top-layer TFs might confer a selective advantage, as this permits at least some members in a clonal cell population to initiate a response to changing conditions, tight regulation of the core- and bottom-layer TFs may minimize noise propagation and ensure fidelity in regulation. We propose that the interplay between network organization and TF dynamics could permit differential utilization of the same underlying network by distinct members of a clonal cell population.
A Novel Framework for the Comparative Analysis of Biological Networks
Pache, Roland A.; Aloy, Patrick
2012-01-01
Genome sequencing projects provide nearly complete lists of the individual components present in an organism, but reveal little about how they work together. Follow-up initiatives have deciphered thousands of dynamic and context-dependent interrelationships between gene products that need to be analyzed with novel bioinformatics approaches able to capture their complex emerging properties. Here, we present a novel framework for the alignment and comparative analysis of biological networks of arbitrary topology. Our strategy includes the prediction of likely conserved interactions, based on evolutionary distances, to counter the high number of missing interactions in the current interactome networks, and a fast assessment of the statistical significance of individual alignment solutions, which vastly increases its performance with respect to existing tools. Finally, we illustrate the biological significance of the results through the identification of novel complex components and potential cases of cross-talk between pathways and alternative signaling routes. PMID:22363585
Exhaustive identification of steady state cycles in large stoichiometric networks
Wright, Jeremiah; Wagner, Andreas
2008-01-01
Background Identifying cyclic pathways in chemical reaction networks is important, because such cycles may indicate in silico violation of energy conservation, or the existence of feedback in vivo. Unfortunately, our ability to identify cycles in stoichiometric networks, such as signal transduction and genome-scale metabolic networks, has been hampered by the computational complexity of the methods currently used. Results We describe a new algorithm for the identification of cycles in stoichiometric networks, and we compare its performance to two others by exhaustively identifying the cycles contained in the genome-scale metabolic networks of H. pylori, M. barkeri, E. coli, and S. cerevisiae. Our algorithm can substantially decrease both the execution time and maximum memory usage in comparison to the two previous algorithms. Conclusion The algorithm we describe improves our ability to study large, real-world, biochemical reaction networks, although additional methodological improvements are desirable. PMID:18616835
Two-Way Gene Interaction From Microarray Data Based on Correlation Methods.
Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh
2016-06-01
Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman's rank correlation coefficient and Blomqvist's measure, and compared them with Pearson's correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson's correlation, Spearman's rank correlation, and Blomqvist's coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist's coefficient was not confirmed by visual methods. Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.
Sybil--efficient constraint-based modelling in R.
Gelius-Dietrich, Gabriel; Desouki, Abdelmoneim Amer; Fritzemeier, Claus Jonathan; Lercher, Martin J
2013-11-13
Constraint-based analyses of metabolic networks are widely used to simulate the properties of genome-scale metabolic networks. Publicly available implementations tend to be slow, impeding large scale analyses such as the genome-wide computation of pairwise gene knock-outs, or the automated search for model improvements. Furthermore, available implementations cannot easily be extended or adapted by users. Here, we present sybil, an open source software library for constraint-based analyses in R; R is a free, platform-independent environment for statistical computing and graphics that is widely used in bioinformatics. Among other functions, sybil currently provides efficient methods for flux-balance analysis (FBA), MOMA, and ROOM that are about ten times faster than previous implementations when calculating the effect of whole-genome single gene deletions in silico on a complete E. coli metabolic model. Due to the object-oriented architecture of sybil, users can easily build analysis pipelines in R or even implement their own constraint-based algorithms. Based on its highly efficient communication with different mathematical optimisation programs, sybil facilitates the exploration of high-dimensional optimisation problems on small time scales. Sybil and all its dependencies are open source. Sybil and its documentation are available for download from the comprehensive R archive network (CRAN).
Kumar, Avishek; Butler, Brandon M; Kumar, Sudhir; Ozkan, S Banu
2015-12-01
Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Scalable Approach for Discovering Conserved Active Subnetworks across Species
Verfaillie, Catherine M.; Hu, Wei-Shou; Myers, Chad L.
2010-01-01
Overlaying differential changes in gene expression on protein interaction networks has proven to be a useful approach to interpreting the cell's dynamic response to a changing environment. Despite successes in finding active subnetworks in the context of a single species, the idea of overlaying lists of differentially expressed genes on networks has not yet been extended to support the analysis of multiple species' interaction networks. To address this problem, we designed a scalable, cross-species network search algorithm, neXus (Network - cross(X)-species - Search), that discovers conserved, active subnetworks based on parallel differential expression studies in multiple species. Our approach leverages functional linkage networks, which provide more comprehensive coverage of functional relationships than physical interaction networks by combining heterogeneous types of genomic data. We applied our cross-species approach to identify conserved modules that are differentially active in stem cells relative to differentiated cells based on parallel gene expression studies and functional linkage networks from mouse and human. We find hundreds of conserved active subnetworks enriched for stem cell-associated functions such as cell cycle, DNA repair, and chromatin modification processes. Using a variation of this approach, we also find a number of species-specific networks, which likely reflect mechanisms of stem cell function that have diverged between mouse and human. We assess the statistical significance of the subnetworks by comparing them with subnetworks discovered on random permutations of the differential expression data. We also describe several case examples that illustrate the utility of comparative analysis of active subnetworks. PMID:21170309
Cheng, Feixiong; Liu, Chuang; Shen, Bairong; Zhao, Zhongming
2016-08-26
Cancer is increasingly recognized as a cellular system phenomenon that is attributed to the accumulation of genetic or epigenetic alterations leading to the perturbation of the molecular network architecture. Elucidation of network properties that can characterize tumor initiation and progression, or pinpoint the molecular targets related to the drug sensitivity or resistance, is therefore of critical importance for providing systems-level insights into tumorigenesis and clinical outcome in the molecularly targeted cancer therapy. In this study, we developed a network-based framework to quantitatively examine cellular network heterogeneity and modularity in cancer. Specifically, we constructed gene co-expressed protein interaction networks derived from large-scale RNA-Seq data across 8 cancer types generated in The Cancer Genome Atlas (TCGA) project. We performed gene network entropy and balanced versus unbalanced motif analysis to investigate cellular network heterogeneity and modularity in tumor versus normal tissues, different stages of progression, and drug resistant versus sensitive cancer cell lines. We found that tumorigenesis could be characterized by a significant increase of gene network entropy in all of the 8 cancer types. The ratio of the balanced motifs in normal tissues is higher than that of tumors, while the ratio of unbalanced motifs in tumors is higher than that of normal tissues in all of the 8 cancer types. Furthermore, we showed that network entropy could be used to characterize tumor progression and anticancer drug responses. For example, we found that kinase inhibitor resistant cancer cell lines had higher entropy compared to that of sensitive cell lines using the integrative analysis of microarray gene expression and drug pharmacological data collected from the Genomics of Drug Sensitivity in Cancer database. In addition, we provided potential network-level evidence that smoking might increase cancer cellular network heterogeneity and further contribute to tyrosine kinase inhibitor (e.g., gefitinib) resistance. In summary, we demonstrated that network properties such as network entropy and unbalanced motifs associated with tumor initiation, progression, and anticancer drug responses, suggesting new potential network-based prognostic and predictive measure in cancer.
MUFFINN: cancer gene discovery via network analysis of somatic mutation data.
Cho, Ara; Shim, Jung Eun; Kim, Eiru; Supek, Fran; Lehner, Ben; Lee, Insuk
2016-06-23
A major challenge for distinguishing cancer-causing driver mutations from inconsequential passenger mutations is the long-tail of infrequently mutated genes in cancer genomes. Here, we present and evaluate a method for prioritizing cancer genes accounting not only for mutations in individual genes but also in their neighbors in functional networks, MUFFINN (MUtations For Functional Impact on Network Neighbors). This pathway-centric method shows high sensitivity compared with gene-centric analyses of mutation data. Notably, only a marginal decrease in performance is observed when using 10 % of TCGA patient samples, suggesting the method may potentiate cancer genome projects with small patient populations.
iAK692: A genome-scale metabolic model of Spirulina platensis C1
2012-01-01
Background Spirulina (Arthrospira) platensis is a well-known filamentous cyanobacterium used in the production of many industrial products, including high value compounds, healthy food supplements, animal feeds, pharmaceuticals and cosmetics, for example. It has been increasingly studied around the world for scientific purposes, especially for its genome, biology, physiology, and also for the analysis of its small-scale metabolic network. However, the overall description of the metabolic and biotechnological capabilities of S. platensis requires the development of a whole cellular metabolism model. Recently, the S. platensis C1 (Arthrospira sp. PCC9438) genome sequence has become available, allowing systems-level studies of this commercial cyanobacterium. Results In this work, we present the genome-scale metabolic network analysis of S. platensis C1, iAK692, its topological properties, and its metabolic capabilities and functions. The network was reconstructed from the S. platensis C1 annotated genomic sequence using Pathway Tools software to generate a preliminary network. Then, manual curation was performed based on a collective knowledge base and a combination of genomic, biochemical, and physiological information. The genome-scale metabolic model consists of 692 genes, 837 metabolites, and 875 reactions. We validated iAK692 by conducting fermentation experiments and simulating the model under autotrophic, heterotrophic, and mixotrophic growth conditions using COBRA toolbox. The model predictions under these growth conditions were consistent with the experimental results. The iAK692 model was further used to predict the unique active reactions and essential genes for each growth condition. Additionally, the metabolic states of iAK692 during autotrophic and mixotrophic growths were described by phenotypic phase plane (PhPP) analysis. Conclusions This study proposes the first genome-scale model of S. platensis C1, iAK692, which is a predictive metabolic platform for a global understanding of physiological behaviors and metabolic engineering. This platform could accelerate the integrative analysis of various “-omics” data, leading to strain improvement towards a diverse range of desired industrial products from Spirulina. PMID:22703714
iAK692: a genome-scale metabolic model of Spirulina platensis C1.
Klanchui, Amornpan; Khannapho, Chiraphan; Phodee, Atchara; Cheevadhanarak, Supapon; Meechai, Asawin
2012-06-15
Spirulina (Arthrospira) platensis is a well-known filamentous cyanobacterium used in the production of many industrial products, including high value compounds, healthy food supplements, animal feeds, pharmaceuticals and cosmetics, for example. It has been increasingly studied around the world for scientific purposes, especially for its genome, biology, physiology, and also for the analysis of its small-scale metabolic network. However, the overall description of the metabolic and biotechnological capabilities of S. platensis requires the development of a whole cellular metabolism model. Recently, the S. platensis C1 (Arthrospira sp. PCC9438) genome sequence has become available, allowing systems-level studies of this commercial cyanobacterium. In this work, we present the genome-scale metabolic network analysis of S. platensis C1, iAK692, its topological properties, and its metabolic capabilities and functions. The network was reconstructed from the S. platensis C1 annotated genomic sequence using Pathway Tools software to generate a preliminary network. Then, manual curation was performed based on a collective knowledge base and a combination of genomic, biochemical, and physiological information. The genome-scale metabolic model consists of 692 genes, 837 metabolites, and 875 reactions. We validated iAK692 by conducting fermentation experiments and simulating the model under autotrophic, heterotrophic, and mixotrophic growth conditions using COBRA toolbox. The model predictions under these growth conditions were consistent with the experimental results. The iAK692 model was further used to predict the unique active reactions and essential genes for each growth condition. Additionally, the metabolic states of iAK692 during autotrophic and mixotrophic growths were described by phenotypic phase plane (PhPP) analysis. This study proposes the first genome-scale model of S. platensis C1, iAK692, which is a predictive metabolic platform for a global understanding of physiological behaviors and metabolic engineering. This platform could accelerate the integrative analysis of various "-omics" data, leading to strain improvement towards a diverse range of desired industrial products from Spirulina.
"Harnessing genomics to improve health in India" – an executive course to support genomics policy
Acharya, Tara; Kumar, Nandini K; Muthuswamy, Vasantha; Daar, Abdallah S; Singer, Peter A
2004-01-01
Background The benefits of scientific medicine have eluded millions in developing countries and the genomics revolution threatens to increase health inequities between North and South. India, as a developing yet also industrialized country, is uniquely positioned to pioneer science policy innovations to narrow the genomics divide. Recognizing this, the Indian Council of Medical Research and the University of Toronto Joint Centre for Bioethics conducted a Genomics Policy Executive Course in January 2003 in Kerala, India. The course provided a forum for stakeholders to discuss the relevance of genomics for health in India. This article presents the course findings and recommendations formulated by the participants for genomics policy in India. Methods The course goals were to familiarize participants with the implications of genomics for health in India; analyze and debate policy and ethical issues; and develop a multi-sectoral opinion leaders' network to share perspectives. To achieve these goals, the course brought together representatives of academic research centres, biotechnology companies, regulatory bodies, media, voluntary, and legal organizations to engage in discussion. Topics included scientific advances in genomics, followed by innovations in business models, public sector perspectives, ethics, legal issues and national innovation systems. Results Seven main recommendations emerged: increase funding for healthcare research with appropriate emphasis on genomics; leverage India's assets such as traditional knowledge and genomic diversity in consultation with knowledge-holders; prioritize strategic entry points for India; improve industry-academic interface with appropriate incentives to improve public health and the nation's wealth; develop independent, accountable, transparent regulatory systems to ensure that ethical, legal and social issues are addressed for a single entry, smart and effective system; engage the public and ensure broad-based input into policy setting; ensure equitable access of poor to genomics products and services; deliver knowledge, products and services for public health. A key outcome of the course was the internet-based opinion leaders' network – the Indian Genome Policy Forum – a multi-stakeholder forum to foster further discussion on policy. Conclusion We expect that the process that has led to this network will serve as a model to establish similar Science and Technology policy networks on regional levels and eventually on a global level. PMID:15151698
Enumeration of Smallest Intervention Strategies in Genome-Scale Metabolic Networks
von Kamp, Axel; Klamt, Steffen
2014-01-01
One ultimate goal of metabolic network modeling is the rational redesign of biochemical networks to optimize the production of certain compounds by cellular systems. Although several constraint-based optimization techniques have been developed for this purpose, methods for systematic enumeration of intervention strategies in genome-scale metabolic networks are still lacking. In principle, Minimal Cut Sets (MCSs; inclusion-minimal combinations of reaction or gene deletions that lead to the fulfilment of a given intervention goal) provide an exhaustive enumeration approach. However, their disadvantage is the combinatorial explosion in larger networks and the requirement to compute first the elementary modes (EMs) which itself is impractical in genome-scale networks. We present MCSEnumerator, a new method for effective enumeration of the smallest MCSs (with fewest interventions) in genome-scale metabolic network models. For this we combine two approaches, namely (i) the mapping of MCSs to EMs in a dual network, and (ii) a modified algorithm by which shortest EMs can be effectively determined in large networks. In this way, we can identify the smallest MCSs by calculating the shortest EMs in the dual network. Realistic application examples demonstrate that our algorithm is able to list thousands of the most efficient intervention strategies in genome-scale networks for various intervention problems. For instance, for the first time we could enumerate all synthetic lethals in E.coli with combinations of up to 5 reactions. We also applied the new algorithm exemplarily to compute strain designs for growth-coupled synthesis of different products (ethanol, fumarate, serine) by E.coli. We found numerous new engineering strategies partially requiring less knockouts and guaranteeing higher product yields (even without the assumption of optimal growth) than reported previously. The strength of the presented approach is that smallest intervention strategies can be quickly calculated and screened with neither network size nor the number of required interventions posing major challenges. PMID:24391481
SoyNet: a database of co-functional networks for soybean Glycine max.
Kim, Eiru; Hwang, Sohyun; Lee, Insuk
2017-01-04
Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sun, Hokeun; Wang, Shuang
2013-05-30
The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non-tumor tissues or between pre-treatment and post-treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network-based penalty that encourages a grouping effect of (1) linked Cytosine-phosphate-Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high-dimensional variable selection problems for matched case-control data. We further investigated the benefits of utilizing biological group or graph information for matched case-control data. We applied the proposed method to a genome-wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non-tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper. Copyright © 2012 John Wiley & Sons, Ltd.
Funding Opportunity: Genomic Data Centers
Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,
Frietze, Seth; Leatherman, Judith
2014-03-01
New genes that arise from modification of the noncoding portion of a genome rather than being duplicated from parent genes are called de novo genes. These genes, identified by their brief evolution and lack of parent genes, provide an opportunity to study the timeframe in which emerging genes integrate into cellular networks, and how the characteristics of these genes change as they mature into bona fide genes. An article by G. Abrusán provides an opportunity to introduce students to fundamental concepts in evolutionary and comparative genetics and to provide a technical background by which to discuss systems biology approaches when studying the evolutionary process of gene birth. Basic background needed to understand the Abrusán study and details on comparative genomic concepts tailored for a classroom discussion are provided, including discussion questions and a supplemental exercise on navigating a genome database.
eHive: An Artificial Intelligence workflow system for genomic analysis
2010-01-01
Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. Conclusions eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/. PMID:20459813
Systematic assignment of thermodynamic constraints in metabolic network models
Kümmel, Anne; Panke, Sven; Heinemann, Matthias
2006-01-01
Background The availability of genome sequences for many organisms enabled the reconstruction of several genome-scale metabolic network models. Currently, significant efforts are put into the automated reconstruction of such models. For this, several computational tools have been developed that particularly assist in identifying and compiling the organism-specific lists of metabolic reactions. In contrast, the last step of the model reconstruction process, which is the definition of the thermodynamic constraints in terms of reaction directionalities, still needs to be done manually. No computational method exists that allows for an automated and systematic assignment of reaction directions in genome-scale models. Results We present an algorithm that – based on thermodynamics, network topology and heuristic rules – automatically assigns reaction directions in metabolic models such that the reaction network is thermodynamically feasible with respect to the production of energy equivalents. It first exploits all available experimentally derived Gibbs energies of formation to identify irreversible reactions. As these thermodynamic data are not available for all metabolites, in a next step, further reaction directions are assigned on the basis of network topology considerations and thermodynamics-based heuristic rules. Briefly, the algorithm identifies reaction subsets from the metabolic network that are able to convert low-energy co-substrates into their high-energy counterparts and thus net produce energy. Our algorithm aims at disabling such thermodynamically infeasible cyclic operation of reaction subnetworks by assigning reaction directions based on a set of thermodynamics-derived heuristic rules. We demonstrate our algorithm on a genome-scale metabolic model of E. coli. The introduced systematic direction assignment yielded 130 irreversible reactions (out of 920 total reactions), which corresponds to about 70% of all irreversible reactions that are required to disable thermodynamically infeasible energy production. Conclusion Although not being fully comprehensive, our algorithm for systematic reaction direction assignment could define a significant number of irreversible reactions automatically with low computational effort. We envision that the presented algorithm is a valuable part of a computational framework that assists the automated reconstruction of genome-scale metabolic models. PMID:17123434
Network Medicine: A Network-based Approach to Human Disease
Barabási, Albert-László; Gulbahce, Natali; Loscalzo, Joseph
2011-01-01
Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular network. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships between apparently distinct (patho)phenotypes. Advances in this direction are essential to identify new diseases genes, to uncover the biological significance of disease-associated mutations identified by genome-wide association studies and full genome sequencing, and to identify drug targets and biomarkers for complex diseases. PMID:21164525
Decoding the genome beyond sequencing: the new phase of genomic research.
Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J
2011-10-01
While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Evolutionary Conservation and Divergence of Gene Coexpression Networks in Gossypium (Cotton) Seeds.
Hu, Guanjing; Hovav, Ran; Grover, Corrinne E; Faigenboim-Doron, Adi; Kadmon, Noa; Page, Justin T; Udall, Joshua A; Wendel, Jonathan F
2016-12-01
The cotton genus (Gossypium) provides a superior system for the study of diversification, genome evolution, polyploidization, and human-mediated selection. To gain insight into phenotypic diversification in cotton seeds, we conducted coexpression network analysis of developing seeds from diploid and allopolyploid cotton species and explored network properties. Key network modules and functional associations were identified related to seed oil content and seed weight. We compared species-specific networks to reveal topological changes, including rewired edges and differentially coexpressed genes, associated with speciation, polyploidy, and cotton domestication. Network comparisons among species indicate that topologies are altered in addition to gene expression profiles, indicating that changes in transcriptomic coexpression relationships play a role in the developmental architecture of cotton seed development. The global network topology of allopolyploids, especially for domesticated G. hirsutum, resembles the network of the A-genome diploid more than that of the D-genome parent, despite its D-like phenotype in oil content. Expression modifications associated with allopolyploidy include coexpression level dominance and transgressive expression, suggesting that the transcriptomic architecture in polyploids is to some extent a modular combination of that of its progenitor genomes. Among allopolyploids, intermodular relationships are more preserved between two different wild allopolyploid species than they are between wild and domesticated forms of a cultivated cotton, and regulatory connections of oil synthesis-related pathways are denser and more closely clustered in domesticated vs. wild G. hirsutum. These results demonstrate substantial modification of genic coexpression under domestication. Our work demonstrates how network inference informs our understanding of the transcriptomic architecture of phenotypic variation associated with temporal scales ranging from thousands (domestication) to millions (speciation) of years, and by polyploidy. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Takemoto, Kazuhiro; Aie, Kazuki
2017-05-25
Host-pathogen interactions are important in a wide range of research fields. Given the importance of metabolic crosstalk between hosts and pathogens, a metabolic network-based reverse ecology method was proposed to infer these interactions. However, the validity of this method remains unclear because of the various explanations presented and the influence of potentially confounding factors that have thus far been neglected. We re-evaluated the importance of the reverse ecology method for evaluating host-pathogen interactions while statistically controlling for confounding effects using oxygen requirement, genome, metabolic network, and phylogeny data. Our data analyses showed that host-pathogen interactions were more strongly influenced by genome size, primary network parameters (e.g., number of edges), oxygen requirement, and phylogeny than the reserve ecology-based measures. These results indicate the limitations of the reverse ecology method; however, they do not discount the importance of adopting reverse ecology approaches altogether. Rather, we highlight the need for developing more suitable methods for inferring host-pathogen interactions and conducting more careful examinations of the relationships between metabolic networks and host-pathogen interactions.
Tabe-Bordbar, Shayan; Marashi, Sayed-Amir
2013-12-01
Elementary modes (EMs) are steady-state metabolic flux vectors with minimal set of active reactions. Each EM corresponds to a metabolic pathway. Therefore, studying EMs is helpful for analyzing the production of biotechnologically important metabolites. However, memory requirements for computing EMs may hamper their applicability as, in most genome-scale metabolic models, no EM can be computed due to running out of memory. In this study, we present a method for computing randomly sampled EMs. In this approach, a network reduction algorithm is used for EM computation, which is based on flux balance-based methods. We show that this approach can be used to recover the EMs in the medium- and genome-scale metabolic network models, while the EMs are sampled in an unbiased way. The applicability of such results is shown by computing “estimated” control-effective flux values in Escherichia coli metabolic network.
Vongsangnak, Wanwipa; Raethong, Nachon; Mujchariyakul, Warasinee; Nguyen, Nam Ninh; Leong, Hon Wai; Laoteng, Kobkul
2017-08-30
The first genome-scale metabolic network of Cordyceps militaris (iWV1170) was constructed representing its whole metabolisms, which consisted of 894 metabolites and 1,267 metabolic reactions across five compartments, including the plasma membrane, cytoplasm, mitochondria, peroxisome and extracellular space. The iWV1170 could be exploited to explain its phenotypes of growth ability, cordycepin and other metabolites production on various substrates. A high number of genes encoding extracellular enzymes for degradation of complex carbohydrates, lipids and proteins were existed in C. militaris genome. By comparative genome-scale analysis, the adenine metabolic pathway towards putative cordycepin biosynthesis was reconstructed, indicating their evolutionary relationships across eleven species of entomopathogenic fungi. The overall metabolic routes involved in the putative cordycepin biosynthesis were also identified in C. militaris, including central carbon metabolism, amino acid metabolism (glycine, l-glutamine and l-aspartate) and nucleotide metabolism (adenosine and adenine). Interestingly, a lack of the sequence coding for ribonucleotide reductase inhibitor was observed in C. militaris that might contribute to its over-production of cordycepin. Copyright © 2017. Published by Elsevier B.V.
Ricebase - a resource for rice breeding
USDA-ARS?s Scientific Manuscript database
Ricebase combines accessions, traits, markers, and genes with genome-scale datasets to empower rice breeders and geneticists to explore big-data resources. The underlying code and schema are shared with CassavaBase and the Sol Genomics Network (SGN) databases. Ricebase was launched specifically to m...
McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick
2007-01-01
The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.
2014-01-01
Over 95% of all metazoan (animal) species comprise the “invertebrates,” but very few genomes from these organisms have been sequenced. We have, therefore, formed a “Global Invertebrate Genomics Alliance” (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture. PMID:24336862
Bracken-Grissom, Heather; Collins, Allen G; Collins, Timothy; Crandall, Keith; Distel, Daniel; Dunn, Casey; Giribet, Gonzalo; Haddock, Steven; Knowlton, Nancy; Martindale, Mark; Medina, Mónica; Messing, Charles; O'Brien, Stephen J; Paulay, Gustav; Putnam, Nicolas; Ravasi, Timothy; Rouse, Greg W; Ryan, Joseph F; Schulze, Anja; Wörheide, Gert; Adamska, Maja; Bailly, Xavier; Breinholt, Jesse; Browne, William E; Diaz, M Christina; Evans, Nathaniel; Flot, Jean-François; Fogarty, Nicole; Johnston, Matthew; Kamel, Bishoy; Kawahara, Akito Y; Laberge, Tammy; Lavrov, Dennis; Michonneau, François; Moroz, Leonid L; Oakley, Todd; Osborne, Karen; Pomponi, Shirley A; Rhodes, Adelaide; Santos, Scott R; Satoh, Nori; Thacker, Robert W; Van de Peer, Yves; Voolstra, Christian R; Welch, David Mark; Winston, Judith; Zhou, Xin
2014-01-01
Over 95% of all metazoan (animal) species comprise the "invertebrates," but very few genomes from these organisms have been sequenced. We have, therefore, formed a "Global Invertebrate Genomics Alliance" (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture.
Two-Way Gene Interaction From Microarray Data Based on Correlation Methods
Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh
2016-01-01
Background Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. Objectives The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. Materials and Methods In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman’s rank correlation coefficient and Blomqvist’s measure, and compared them with Pearson’s correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson’s correlation, Spearman’s rank correlation, and Blomqvist’s coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Results Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist’s coefficient was not confirmed by visual methods. Conclusions Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data. PMID:27621916
Arar, Nedal; Knight, Sara J; Modell, Stephen M; Issa, Amalia M
2011-03-01
The main mission of the Genomic Applications in Practice and Prevention Network™ is to advance collaborative efforts involving partners from across the public health sector to realize the promise of genomics in healthcare and disease prevention. We introduce a new framework that supports the Genomic Applications in Practice and Prevention Network mission and leverages the characteristics of the complex adaptive systems approach. We call this framework the Genome-based Knowledge Management in Cycles model (G-KNOMIC). G-KNOMIC proposes that the collaborative work of multidisciplinary teams utilizing genome-based applications will enhance translating evidence-based genomic findings by creating ongoing knowledge management cycles. Each cycle consists of knowledge synthesis, knowledge evaluation, knowledge implementation and knowledge utilization. Our framework acknowledges that all the elements in the knowledge translation process are interconnected and continuously changing. It also recognizes the importance of feedback loops, and the ability of teams to self-organize within a dynamic system. We demonstrate how this framework can be used to improve the adoption of genomic technologies into practice using two case studies of genomic uptake.
fastBMA: scalable network inference and transitive reduction.
Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee
2017-10-01
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/). © The Authors 2017. Published by Oxford University Press.
The connectivity structure, giant strong component and centrality of metabolic networks.
Ma, Hong-Wu; Zeng, An-Ping
2003-07-22
Structural and functional analysis of genome-based large-scale metabolic networks is important for understanding the design principles and regulation of the metabolism at a system level. The metabolic network is conventionally considered to be highly integrated and very complex. A rational reduction of the metabolic network to its core structure and a deeper understanding of its functional modules are important. In this work, we show that the metabolites in a metabolic network are far from fully connected. A connectivity structure consisting of four major subsets of metabolites and reactions, i.e. a fully connected sub-network, a substrate subset, a product subset and an isolated subset is found to exist in metabolic networks of 65 fully sequenced organisms. The largest fully connected part of a metabolic network, called 'the giant strong component (GSC)', represents the most complicated part and the core of the network and has the feature of scale-free networks. The average path length of the whole network is primarily determined by that of the GSC. For most of the organisms, GSC normally contains less than one-third of the nodes of the network. This connectivity structure is very similar to the 'bow-tie' structure of World Wide Web. Our results indicate that the bow-tie structure may be common for large-scale directed networks. More importantly, the uncovered structure feature makes a structural and functional analysis of large-scale metabolic network more amenable. As shown in this work, comparing the closeness centrality of the nodes in the GSC can identify the most central metabolites of a metabolic network. To quantitatively characterize the overall connection structure of the GSC we introduced the term 'overall closeness centralization index (OCCI)'. OCCI correlates well with the average path length of the GSC and is a useful parameter for a system-level comparison of metabolic networks of different organisms. http://genome.gbf.de/bioinformatics/
Large Scale Proteomic Data and Network-Based Systems Biology Approaches to Explore the Plant World.
Di Silvestre, Dario; Bergamaschi, Andrea; Bellini, Edoardo; Mauri, PierLuigi
2018-06-03
The investigation of plant organisms by means of data-derived systems biology approaches based on network modeling is mainly characterized by genomic data, while the potential of proteomics is largely unexplored. This delay is mainly caused by the paucity of plant genomic/proteomic sequences and annotations which are fundamental to perform mass-spectrometry (MS) data interpretation. However, Next Generation Sequencing (NGS) techniques are contributing to filling this gap and an increasing number of studies are focusing on plant proteome profiling and protein-protein interactions (PPIs) identification. Interesting results were obtained by evaluating the topology of PPI networks in the context of organ-associated biological processes as well as plant-pathogen relationships. These examples foreshadow well the benefits that these approaches may provide to plant research. Thus, in addition to providing an overview of the main-omic technologies recently used on plant organisms, we will focus on studies that rely on concepts of module, hub and shortest path, and how they can contribute to the plant discovery processes. In this scenario, we will also consider gene co-expression networks, and some examples of integration with metabolomic data and genome-wide association studies (GWAS) to select candidate genes will be mentioned.
Chakraborty, Chiranjib; Sarkar, Bimal Kumar; Patel, Pratiksha; Agoramoorthy, Govindasamy
2012-01-01
In this paper, Shannon information theory has been applied to elaborate cell signaling. It is proposed that in the cellular network architecture, four components viz. source (DNA), transmitter (mRNA), receiver (protein) and destination (another protein) are involved. The message transmits from source (DNA) to transmitter (mRNA) and then passes through a noisy channel reaching finally the receiver (protein). The protein synthesis process is here considered as the noisy channel. Ultimately, signal is transmitted from receiver to destination (another protein). The genome network architecture elements were compared with genetic alphabet L = {A, C, G, T} with a biophysical model based on the popular Shannon information theory. This study found the channel capacity as maximum for zero error (sigma = 0) and at this condition, transition matrix becomes a unit matrix with rank 4. The transition matrix will be erroneous and finally at sigma = 1 channel capacity will be localized maxima with a value of 0.415 due to the increased value at sigma. On the other hand, minima exists at sigma = 0.75, where all transition probabilities become 0.25 and uncertainty will be maximum resulting in channel capacity with the minima value of zero.
Systematic Evaluation of Molecular Networks for Discovery of Disease Genes.
Huang, Justin K; Carlin, Daniel E; Yu, Michael Ku; Zhang, Wei; Kreisberg, Jason F; Tamayo, Pablo; Ideker, Trey
2018-04-25
Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall. A general tendency is that performance scales with network size, suggesting that new interaction discovery currently outweighs the detrimental effects of false positives. Correcting for size, we find that the DIP network provides the highest efficiency (value per interaction). Based on these results, we create a parsimonious composite network with both high efficiency and performance. This work provides a benchmark for selection of molecular networks in human disease research. Copyright © 2018 Elsevier Inc. All rights reserved.
Dumas, Marc-Emmanuel; Domange, Céline; Calderari, Sophie; Martínez, Andrea Rodríguez; Ayala, Rafael; Wilder, Steven P; Suárez-Zamorano, Nicolas; Collins, Stephan C; Wallis, Robert H; Gu, Quan; Wang, Yulan; Hue, Christophe; Otto, Georg W; Argoud, Karène; Navratil, Vincent; Mitchell, Steve C; Lindon, John C; Holmes, Elaine; Cazier, Jean-Baptiste; Nicholson, Jeremy K; Gauguier, Dominique
2016-09-30
The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus occurs through complex organ-specific cellular mechanisms and networks contributing to impaired insulin secretion and insulin resistance. Genome-wide gene expression profiling systems can dissect the genetic contributions to metabolome and transcriptome regulations. The integrative analysis of multiple gene expression traits and metabolic phenotypes (i.e., metabotypes) together with their underlying genetic regulation remains a challenge. Here, we introduce a systems genetics approach based on the topological analysis of a combined molecular network made of genes and metabolites identified through expression and metabotype quantitative trait locus mapping (i.e., eQTL and mQTL) to prioritise biological characterisation of candidate genes and traits. We used systematic metabotyping by 1 H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualize the shortest paths between metabolites and genes significantly associated with each genomic block. Despite strong genomic similarities (95-99 %) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting the metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific mQTLs and genome-wide eQTLs. Variation in key metabolites like glucose, succinate, lactate, or 3-hydroxybutyrate and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing the shortest path length drove prioritization of biological validations by gene silencing. These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulation and to characterize novel functional roles for genes determining tissue-specific metabolism.
Inference of gene regulatory networks from genome-wide knockout fitness data
Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.
2013-01-01
Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online PMID:23271269
The Double-Stranded DNA Virosphere as a Modular Hierarchical Network of Gene Sharing
Iranzo, Jaime
2016-01-01
ABSTRACT Virus genomes are prone to extensive gene loss, gain, and exchange and share no universal genes. Therefore, in a broad-scale study of virus evolution, gene and genome network analyses can complement traditional phylogenetics. We performed an exhaustive comparative analysis of the genomes of double-stranded DNA (dsDNA) viruses by using the bipartite network approach and found a robust hierarchical modularity in the dsDNA virosphere. Bipartite networks consist of two classes of nodes, with nodes in one class, in this case genomes, being connected via nodes of the second class, in this case genes. Such a network can be partitioned into modules that combine nodes from both classes. The bipartite network of dsDNA viruses includes 19 modules that form 5 major and 3 minor supermodules. Of these modules, 11 include tailed bacteriophages, reflecting the diversity of this largest group of viruses. The module analysis quantitatively validates and refines previously proposed nontrivial evolutionary relationships. An expansive supermodule combines the large and giant viruses of the putative order “Megavirales” with diverse moderate-sized viruses and related mobile elements. All viruses in this supermodule share a distinct morphogenetic tool kit with a double jelly roll major capsid protein. Herpesviruses and tailed bacteriophages comprise another supermodule, held together by a distinct set of morphogenetic proteins centered on the HK97-like major capsid protein. Together, these two supermodules cover the great majority of currently known dsDNA viruses. We formally identify a set of 14 viral hallmark genes that comprise the hubs of the network and account for most of the intermodule connections. PMID:27486193
Diverse types of genetic variation converge on functional gene networks involved in schizophrenia.
Gilman, Sarah R; Chang, Jonathan; Xu, Bin; Bawa, Tejdeep S; Gogos, Joseph A; Karayiorgou, Maria; Vitkup, Dennis
2012-12-01
Despite the successful identification of several relevant genomic loci, the underlying molecular mechanisms of schizophrenia remain largely unclear. We developed a computational approach (NETBAG+) that allows an integrated analysis of diverse disease-related genetic data using a unified statistical framework. The application of this approach to schizophrenia-associated genetic variations, obtained using unbiased whole-genome methods, allowed us to identify several cohesive gene networks related to axon guidance, neuronal cell mobility, synaptic function and chromosomal remodeling. The genes forming the networks are highly expressed in the brain, with higher brain expression during prenatal development. The identified networks are functionally related to genes previously implicated in schizophrenia, autism and intellectual disability. A comparative analysis of copy number variants associated with autism and schizophrenia suggests that although the molecular networks implicated in these distinct disorders may be related, the mutations associated with each disease are likely to lead, at least on average, to different functional consequences.
Development of constraint-based system-level models of microbial metabolism.
Navid, Ali
2012-01-01
Genome-scale models of metabolism are valuable tools for using genomic information to predict microbial phenotypes. System-level mathematical models of metabolic networks have been developed for a number of microbes and have been used to gain new insights into the biochemical conversions that occur within organisms and permit their survival and proliferation. Utilizing these models, computational biologists can (1) examine network structures, (2) predict metabolic capabilities and resolve unexplained experimental observations, (3) generate and test new hypotheses, (4) assess the nutritional requirements of the organism and approximate its environmental niche, (5) identify missing enzymatic functions in the annotated genome, and (6) engineer desired metabolic capabilities in model organisms. This chapter details the protocol for developing genome-scale models of metabolism in microbes as well as tips for accelerating the model building process.
Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben
2015-01-01
The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027
Liu, Zhi-Ping; Wu, Canglin; Miao, Hongyu; Wu, Hulin
2015-01-01
Transcriptional and post-transcriptional regulation of gene expression is of fundamental importance to numerous biological processes. Nowadays, an increasing amount of gene regulatory relationships have been documented in various databases and literature. However, to more efficiently exploit such knowledge for biomedical research and applications, it is necessary to construct a genome-wide regulatory network database to integrate the information on gene regulatory relationships that are widely scattered in many different places. Therefore, in this work, we build a knowledge-based database, named ‘RegNetwork’, of gene regulatory networks for human and mouse by collecting and integrating the documented regulatory interactions among transcription factors (TFs), microRNAs (miRNAs) and target genes from 25 selected databases. Moreover, we also inferred and incorporated potential regulatory relationships based on transcription factor binding site (TFBS) motifs into RegNetwork. As a result, RegNetwork contains a comprehensive set of experimentally observed or predicted transcriptional and post-transcriptional regulatory relationships, and the database framework is flexibly designed for potential extensions to include gene regulatory networks for other organisms in the future. Based on RegNetwork, we characterized the statistical and topological properties of genome-wide regulatory networks for human and mouse, we also extracted and interpreted simple yet important network motifs that involve the interplays between TF-miRNA and their targets. In summary, RegNetwork provides an integrated resource on the prior information for gene regulatory relationships, and it enables us to further investigate context-specific transcriptional and post-transcriptional regulatory interactions based on domain-specific experimental data. Database URL: http://www.regnetworkweb.org PMID:26424082
Yu, J; Blom, J; Glaeser, S P; Jaenicke, S; Juhre, T; Rupp, O; Schwengers, O; Spänig, S; Goesmann, A
2017-11-10
The rapid development of next generation sequencing technology has greatly increased the amount of available microbial genomes. As a result of this development, there is a rising demand for fast and automated approaches in analyzing these genomes in a comparative way. Whole genome sequencing also bears a huge potential for obtaining a higher resolution in phylogenetic and taxonomic classification. During the last decade, several software tools and platforms have been developed in the field of comparative genomics. In this manuscript, we review the most commonly used platforms and approaches for ortholog group analyses with a focus on their potential for phylogenetic and taxonomic research. Furthermore, we describe the latest improvements of the EDGAR platform for comparative genome analyses and present recent examples of its application for the phylogenomic analysis of different taxa. Finally, we illustrate the role of the EDGAR platform as part of the BiGi Center for Microbial Bioinformatics within the German network on Bioinformatics Infrastructure (de.NBI). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Final Technical Report for Award # ER64999
DOE Office of Scientific and Technical Information (OSTI.GOV)
Metcalf, William W.
2014-10-08
This report provides a summary of activities for Award # ER64999, a Genomes to Life Project funded by the Office of Science, Basic Energy Research. The project was entitled "Methanogenic archaea and the global carbon cycle: a systems biology approach to the study of Methanosarcina species". The long-term goal of this multi-investigator project was the creation of integrated, multiscale models that accurately and quantitatively predict the role of Methanosarcina species in the global carbon cycle under dynamic environmental conditions. To achieve these goals we pursed four specific aims: (1) genome sequencing of numerous members of the Order Methanosarcinales, (2) identificationmore » of genomic sources of phenotypic variation through in silico comparative genomics, (3) elucidation of the transcriptional networks of two Methanosarcina species, and (4) development of comprehensive metabolic network models for characterized strains to address the question of how metabolic models scale with genetic distance.« less
Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J
2016-03-22
The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.
Martinez-Morales, Juan R
2016-07-01
Vertebrates, as most animal phyla, originated >500 million years ago during the Cambrian explosion, and progressively radiated into the extant classes. Inferring the evolutionary history of the group requires understanding the architecture of the developmental programs that constrain the vertebrate anatomy. Here, I review recent comparative genomic and epigenomic studies, based on ChIP-seq and chromatin accessibility, which focus on the identification of functionally equivalent cis-regulatory modules among species. This pioneer work, primarily centered in the mammalian lineage, has set the groundwork for further studies in representative vertebrate and chordate species. Mapping of active regulatory regions across lineages will shed new light on the evolutionary forces stabilizing ancestral developmental programs, as well as allowing their variation to sustain morphological adaptations on the inherited vertebrate body plan. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Loots, Gabriela G
2008-01-01
Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.
Ke, Tao; Yu, Jingyin; Dong, Caihua; Mao, Han; Hua, Wei; Liu, Shengyi
2015-01-21
Oil crop seeds are important sources of fatty acids (FAs) for human and animal nutrition. Despite their importance, there is a lack of an essential bioinformatics resource on gene transcription of oil crops from a comparative perspective. In this study, we developed ocsESTdb, the first database of expressed sequence tag (EST) information on seeds of four large-scale oil crops with an emphasis on global metabolic networks and oil accumulation metabolism that target the involved unigenes. A total of 248,522 ESTs and 106,835 unigenes were collected from the cDNA libraries of rapeseed (Brassica napus), soybean (Glycine max), sesame (Sesamum indicum) and peanut (Arachis hypogaea). These unigenes were annotated by a sequence similarity search against databases including TAIR, NR protein database, Gene Ontology, COG, Swiss-Prot, TrEMBL and Kyoto Encyclopedia of Genes and Genomes (KEGG). Five genome-scale metabolic networks that contain different numbers of metabolites and gene-enzyme reaction-association entries were analysed and constructed using Cytoscape and yEd programs. Details of unigene entries, deduced amino acid sequences and putative annotation are available from our database to browse, search and download. Intuitive and graphical representations of EST/unigene sequences, functional annotations, metabolic pathways and metabolic networks are also available. ocsESTdb will be updated regularly and can be freely accessed at http://ocri-genomics.org/ocsESTdb/ . ocsESTdb may serve as a valuable and unique resource for comparative analysis of acyl lipid synthesis and metabolism in oilseed plants. It also may provide vital insights into improving oil content in seeds of oil crop species by transcriptional reconstruction of the metabolic network.
The Plant Genome Integrative Explorer Resource: PlantGenIE.org.
Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R
2015-12-01
Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Genome-wide network of regulatory genes for construction of a chordate embryo.
Shoguchi, Eiichi; Hamaguchi, Makoto; Satoh, Nori
2008-04-15
Animal development is controlled by gene regulation networks that are composed of sequence-specific transcription factors (TF) and cell signaling molecules (ST). Although housekeeping genes have been reported to show clustering in the animal genomes, whether the genes comprising a given regulatory network are physically clustered on a chromosome is uncertain. We examined this question in the present study. Ascidians are the closest living relatives of vertebrates, and their tadpole-type larva represents the basic body plan of chordates. The Ciona intestinalis genome contains 390 core TF genes and 119 major ST genes. Previous gene disruption assays led to the formulation of a basic chordate embryonic blueprint, based on over 3000 genetic interactions among 79 zygotic regulatory genes. Here, we mapped the regulatory genes, including all 79 regulatory genes, on the 14 pairs of Ciona chromosomes by fluorescent in situ hybridization (FISH). Chromosomal localization of upstream and downstream regulatory genes demonstrates that the components of coherent developmental gene networks are evenly distributed over the 14 chromosomes. Thus, this study provides the first comprehensive evidence that the physical clustering of regulatory genes, or their target genes, is not relevant for the genome-wide control of gene expression during development.
Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A
2013-09-02
In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome-wide collection of reference RNA motif regulons is available in the RegPrecise database (http://regprecise.lbl.gov/).
Reconciled rat and human metabolic networks for comparative toxicogenomics and biomarker predictions
Blais, Edik M.; Rawls, Kristopher D.; Dougherty, Bonnie V.; Li, Zhuo I.; Kolling, Glynis L.; Ye, Ping; Wallqvist, Anders; Papin, Jason A.
2017-01-01
The laboratory rat has been used as a surrogate to study human biology for more than a century. Here we present the first genome-scale network reconstruction of Rattus norvegicus metabolism, iRno, and a significantly improved reconstruction of human metabolism, iHsa. These curated models comprehensively capture metabolic features known to distinguish rats from humans including vitamin C and bile acid synthesis pathways. After reconciling network differences between iRno and iHsa, we integrate toxicogenomics data from rat and human hepatocytes, to generate biomarker predictions in response to 76 drugs. We validate comparative predictions for xanthine derivatives with new experimental data and literature-based evidence delineating metabolite biomarkers unique to humans. Our results provide mechanistic insights into species-specific metabolism and facilitate the selection of biomarkers consistent with rat and human biology. These models can serve as powerful computational platforms for contextualizing experimental data and making functional predictions for clinical and basic science applications. PMID:28176778
Meneco, a Topology-Based Gap-Filling Tool Applicable to Degraded Genome-Wide Metabolic Networks
Prigent, Sylvain; Frioux, Clémence; Dittami, Simon M.; Larhlimi, Abdelhalim; Collet, Guillaume; Gutknecht, Fabien; Got, Jeanne; Eveillard, Damien; Bourdon, Jérémie; Plewniak, Frédéric; Tonon, Thierry; Siegel, Anne
2017-01-01
Increasing amounts of sequence data are becoming available for a wide range of non-model organisms. Investigating and modelling the metabolic behaviour of those organisms is highly relevant to understand their biology and ecology. As sequences are often incomplete and poorly annotated, draft networks of their metabolism largely suffer from incompleteness. Appropriate gap-filling methods to identify and add missing reactions are therefore required to address this issue. However, current tools rely on phenotypic or taxonomic information, or are very sensitive to the stoichiometric balance of metabolic reactions, especially concerning the co-factors. This type of information is often not available or at least prone to errors for newly-explored organisms. Here we introduce Meneco, a tool dedicated to the topological gap-filling of genome-scale draft metabolic networks. Meneco reformulates gap-filling as a qualitative combinatorial optimization problem, omitting constraints raised by the stoichiometry of a metabolic network considered in other methods, and solves this problem using Answer Set Programming. Run on several artificial test sets gathering 10,800 degraded Escherichia coli networks Meneco was able to efficiently identify essential reactions missing in networks at high degradation rates, outperforming the stoichiometry-based tools in scalability. To demonstrate the utility of Meneco we applied it to two case studies. Its application to recent metabolic networks reconstructed for the brown algal model Ectocarpus siliculosus and an associated bacterium Candidatus Phaeomarinobacter ectocarpi revealed several candidate metabolic pathways for algal-bacterial interactions. Then Meneco was used to reconstruct, from transcriptomic and metabolomic data, the first metabolic network for the microalga Euglena mutabilis. These two case studies show that Meneco is a versatile tool to complete draft genome-scale metabolic networks produced from heterogeneous data, and to suggest relevant reactions that explain the metabolic capacity of a biological system. PMID:28129330
Meneco, a Topology-Based Gap-Filling Tool Applicable to Degraded Genome-Wide Metabolic Networks.
Prigent, Sylvain; Frioux, Clémence; Dittami, Simon M; Thiele, Sven; Larhlimi, Abdelhalim; Collet, Guillaume; Gutknecht, Fabien; Got, Jeanne; Eveillard, Damien; Bourdon, Jérémie; Plewniak, Frédéric; Tonon, Thierry; Siegel, Anne
2017-01-01
Increasing amounts of sequence data are becoming available for a wide range of non-model organisms. Investigating and modelling the metabolic behaviour of those organisms is highly relevant to understand their biology and ecology. As sequences are often incomplete and poorly annotated, draft networks of their metabolism largely suffer from incompleteness. Appropriate gap-filling methods to identify and add missing reactions are therefore required to address this issue. However, current tools rely on phenotypic or taxonomic information, or are very sensitive to the stoichiometric balance of metabolic reactions, especially concerning the co-factors. This type of information is often not available or at least prone to errors for newly-explored organisms. Here we introduce Meneco, a tool dedicated to the topological gap-filling of genome-scale draft metabolic networks. Meneco reformulates gap-filling as a qualitative combinatorial optimization problem, omitting constraints raised by the stoichiometry of a metabolic network considered in other methods, and solves this problem using Answer Set Programming. Run on several artificial test sets gathering 10,800 degraded Escherichia coli networks Meneco was able to efficiently identify essential reactions missing in networks at high degradation rates, outperforming the stoichiometry-based tools in scalability. To demonstrate the utility of Meneco we applied it to two case studies. Its application to recent metabolic networks reconstructed for the brown algal model Ectocarpus siliculosus and an associated bacterium Candidatus Phaeomarinobacter ectocarpi revealed several candidate metabolic pathways for algal-bacterial interactions. Then Meneco was used to reconstruct, from transcriptomic and metabolomic data, the first metabolic network for the microalga Euglena mutabilis. These two case studies show that Meneco is a versatile tool to complete draft genome-scale metabolic networks produced from heterogeneous data, and to suggest relevant reactions that explain the metabolic capacity of a biological system.
Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei
2014-12-10
Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.
Coi, A L; Bigey, F; Mallet, S; Marsit, S; Zara, G; Gladieux, P; Galeote, V; Budroni, M; Dequin, S; Legras, J L
2017-04-01
The molecular and evolutionary processes underlying fungal domestication remain largely unknown despite the importance of fungi to bioindustry and for comparative adaptation genomics in eukaryotes. Wine fermentation and biological ageing are performed by strains of S. cerevisiae with, respectively, pelagic fermentative growth on glucose and biofilm aerobic growth utilizing ethanol. Here, we use environmental samples of wine and flor yeasts to investigate the genomic basis of yeast adaptation to contrasted anthropogenic environments. Phylogenetic inference and population structure analysis based on single nucleotide polymorphisms revealed a group of flor yeasts separated from wine yeasts. A combination of methods revealed several highly differentiated regions between wine and flor yeasts, and analyses using codon-substitution models for detecting molecular adaptation identified sites under positive selection in the high-affinity transporter gene ZRT1. The cross-population composite likelihood ratio revealed selective sweeps at three regions, including in the hexose transporter gene HXT7, the yapsin gene YPS6 and the membrane protein coding gene MTS27. Our analyses also revealed that the biological ageing environment has led to the accumulation of numerous mutations in proteins from several networks, including Flo11 regulation and divalent metal transport. Together, our findings suggest that the tuning of FLO11 expression and zinc transport networks are a distinctive feature of the genetic changes underlying the domestication of flor yeasts. Our study highlights the multiplicity of genomic changes underlying yeast adaptation to man-made habitats and reveals that flor/wine yeast lineage can serve as a useful model for studying the genomics of adaptive divergence. © 2017 John Wiley & Sons Ltd.
RRW: repeated random walks on genome-scale protein networks for local cluster discovery
Macropol, Kathy; Can, Tolga; Singh, Ambuj K
2009-01-01
Background We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. Results We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. Conclusion RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters. PMID:19740439
Ji, Boyang; Zhang, Sheng-Da; Zhang, Wei-Jia; Rouy, Zoe; Alberto, François; Santini, Claire-Lise; Mangenot, Sophie; Gagnot, Séverine; Philippe, Nadège; Pradel, Nathalie; Zhang, Lichen; Tempel, Sébastien; Li, Ying; Médigue, Claudine; Henrissat, Bernard; Coutinho, Pedro M; Barbe, Valérie; Talla, Emmanuel; Wu, Long-Fei
2017-03-01
Magnetotactic bacteria (MTB) are a group of phylogenetically and physiologically diverse Gram-negative bacteria that synthesize intracellular magnetic crystals named magnetosomes. MTB are affiliated with three classes of Proteobacteria phylum, Nitrospirae phylum, Omnitrophica phylum and probably with the candidate phylum Latescibacteria. The evolutionary origin and physiological diversity of MTB compared with other bacterial taxonomic groups remain to be illustrated. Here, we analysed the genome of the marine magneto-ovoid strain MO-1 and found that it is closely related to Magnetococcus marinus MC-1. Detailed analyses of the ribosomal proteins and whole proteomes of 390 genomes reveal that, among the Proteobacteria analysed, only MO-1 and MC-1 have coding sequences (CDSs) with a similarly high proportion of origins from Alphaproteobacteria, Betaproteobacteria, Deltaproteobacteria and Gammaproteobacteria. Interestingly, a comparative metabolic network analysis with anoxic network enzymes from sequenced MTB and non-MTB successfully allows the eventual prediction of an organism with a metabolic profile compatible for magnetosome production. Altogether, our genomic analysis reveals multiple origins of MO-1 and M. marinus MC-1 genomes and suggests a metabolism-restriction model for explaining whether a bacterium could become an MTB upon acquisition of magnetosome encoding genes. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
You, Ilsun; Sharma, Vishal; Atiquzzaman, Mohammed; Choo, Kim-Kwang Raymond
2016-01-01
With a more Internet-savvy and sophisticated user base, there are more demands for interactive applications and services. However, it is a challenge for existing radio access networks (e.g. 3G and 4G) to cope with the increasingly demanding requirements such as higher data rates and wider coverage area. One potential solution is the inter-collaborative deployment of multiple radio devices in a 5G setting designed to meet exacting user demands, and facilitate the high data rate requirements in the underlying networks. These heterogeneous 5G networks can readily resolve the data rate and coverage challenges. Networks established using the hybridization of existing networks have diverse military and civilian applications. However, there are inherent limitations in such networks such as irregular breakdown, node failures, and halts during speed transmissions. In recent years, there have been attempts to integrate heterogeneous 5G networks with existing ad hoc networks to provide a robust solution for delay-tolerant transmissions in the form of packet switched networks. However, continuous connectivity is still required in these networks, in order to efficiently regulate the flow to allow the formation of a robust network. Therefore, in this paper, we present a novel network formation consisting of nodes from different network maneuvered by Unmanned Aircraft (UA). The proposed model utilizes the features of a biological aspect of genomes and forms a delay tolerant network with existing network models. This allows us to provide continuous and robust connectivity. We then demonstrate that the proposed network model has an efficient data delivery, lower overheads and lesser delays with high convergence rate in comparison to existing approaches, based on evaluations in both real-time testbed and simulation environment.
GDTN: Genome-Based Delay Tolerant Network Formation in Heterogeneous 5G Using Inter-UA Collaboration
2016-01-01
With a more Internet-savvy and sophisticated user base, there are more demands for interactive applications and services. However, it is a challenge for existing radio access networks (e.g. 3G and 4G) to cope with the increasingly demanding requirements such as higher data rates and wider coverage area. One potential solution is the inter-collaborative deployment of multiple radio devices in a 5G setting designed to meet exacting user demands, and facilitate the high data rate requirements in the underlying networks. These heterogeneous 5G networks can readily resolve the data rate and coverage challenges. Networks established using the hybridization of existing networks have diverse military and civilian applications. However, there are inherent limitations in such networks such as irregular breakdown, node failures, and halts during speed transmissions. In recent years, there have been attempts to integrate heterogeneous 5G networks with existing ad hoc networks to provide a robust solution for delay-tolerant transmissions in the form of packet switched networks. However, continuous connectivity is still required in these networks, in order to efficiently regulate the flow to allow the formation of a robust network. Therefore, in this paper, we present a novel network formation consisting of nodes from different network maneuvered by Unmanned Aircraft (UA). The proposed model utilizes the features of a biological aspect of genomes and forms a delay tolerant network with existing network models. This allows us to provide continuous and robust connectivity. We then demonstrate that the proposed network model has an efficient data delivery, lower overheads and lesser delays with high convergence rate in comparison to existing approaches, based on evaluations in both real-time testbed and simulation environment. PMID:27973618
Research 2.0: social networking and direct-to-consumer (DTC) genomics.
Lee, Sandra Soo-Jin; Crawley, LaVera
2009-01-01
The convergence of increasingly efficient high throughput sequencing technology and ubiquitous Internet use by the public has fueled the proliferation of companies that provide personal genetic information (PGI) direct-to-consumers. Companies such as 23andme (Mountain View, CA) and Navigenics (Foster City, CA) are emblematic of a growing market for PGI that some argue represents a paradigm shift in how the public values this information and incorporates it into how they behave and plan for their futures. This new class of social networking business ventures that market the science of the personal genome illustrates the new trend in collaborative science. In addition to fostering a consumer empowerment movement, it promotes the trend of democratizing information--openly sharing of data with all interested parties, not just the biomedical researcher--for the purposes of pooling data (increasing statistical power) and escalating the innovation process. This target article discusses the need for new approaches to studying DTC genomics using social network analysis to identify the impact of obtaining, sharing, and using PGI. As a locus of biosociality, DTC personal genomics forges social relationships based on beliefs of common genetic susceptibility that links risk, disease, and group identity. Ethical issues related to the reframing of DTC personal genomic consumers as advocates and research subjects and the creation of new social formations around health research may be identified through social network analysis.
Yan, Koon-Kiu; Fang, Gang; Bhardwaj, Nitin; Alexander, Roger P.; Gerstein, Mark
2010-01-01
The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a canonical OS (Linux) in terms of topology and evolution. We show that both networks have a fundamentally hierarchical layout, but there is a key difference: The transcriptional regulatory network possesses a few global regulators at the top and many targets at the bottom; conversely, the call graph has many regulators controlling a small set of generic functions. This top-heavy organization leads to highly overlapping functional modules in the call graph, in contrast to the relatively independent modules in the regulatory network. We further develop a way to measure evolutionary rates comparably between the two networks and explain this difference in terms of network evolution. The process of biological evolution via random mutation and subsequent selection tightly constrains the evolution of regulatory network hubs. The call graph, however, exhibits rapid evolution of its highly connected generic components, made possible by designers’ continual fine-tuning. These findings stem from the design principles of the two systems: robustness for biological systems and cost effectiveness (reuse) for software systems. PMID:20439753
Yan, Koon-Kiu; Fang, Gang; Bhardwaj, Nitin; Alexander, Roger P; Gerstein, Mark
2010-05-18
The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a canonical OS (Linux) in terms of topology and evolution. We show that both networks have a fundamentally hierarchical layout, but there is a key difference: The transcriptional regulatory network possesses a few global regulators at the top and many targets at the bottom; conversely, the call graph has many regulators controlling a small set of generic functions. This top-heavy organization leads to highly overlapping functional modules in the call graph, in contrast to the relatively independent modules in the regulatory network. We further develop a way to measure evolutionary rates comparably between the two networks and explain this difference in terms of network evolution. The process of biological evolution via random mutation and subsequent selection tightly constrains the evolution of regulatory network hubs. The call graph, however, exhibits rapid evolution of its highly connected generic components, made possible by designers' continual fine-tuning. These findings stem from the design principles of the two systems: robustness for biological systems and cost effectiveness (reuse) for software systems.
A novel artificial neural network method for biomedical prediction based on matrix pseudo-inversion.
Cai, Binghuang; Jiang, Xia
2014-04-01
Biomedical prediction based on clinical and genome-wide data has become increasingly important in disease diagnosis and classification. To solve the prediction problem in an effective manner for the improvement of clinical care, we develop a novel Artificial Neural Network (ANN) method based on Matrix Pseudo-Inversion (MPI) for use in biomedical applications. The MPI-ANN is constructed as a three-layer (i.e., input, hidden, and output layers) feed-forward neural network, and the weights connecting the hidden and output layers are directly determined based on MPI without a lengthy learning iteration. The LASSO (Least Absolute Shrinkage and Selection Operator) method is also presented for comparative purposes. Single Nucleotide Polymorphism (SNP) simulated data and real breast cancer data are employed to validate the performance of the MPI-ANN method via 5-fold cross validation. Experimental results demonstrate the efficacy of the developed MPI-ANN for disease classification and prediction, in view of the significantly superior accuracy (i.e., the rate of correct predictions), as compared with LASSO. The results based on the real breast cancer data also show that the MPI-ANN has better performance than other machine learning methods (including support vector machine (SVM), logistic regression (LR), and an iterative ANN). In addition, experiments demonstrate that our MPI-ANN could be used for bio-marker selection as well. Copyright © 2013 Elsevier Inc. All rights reserved.
Underlying Principles of Natural Selection in Network Evolution: Systems Biology Approach
Chen, Bor-Sen; Wu, Wei-Sheng
2007-01-01
Systems biology is a rapidly expanding field that integrates diverse areas of science such as physics, engineering, computer science, mathematics, and biology toward the goal of elucidating the underlying principles of hierarchical metabolic and regulatory systems in the cell, and ultimately leading to predictive understanding of cellular response to perturbations. Because post-genomics research is taking place throughout the tree of life, comparative approaches offer a way for combining data from many organisms to shed light on the evolution and function of biological networks from the gene to the organismal level. Therefore, systems biology can build on decades of theoretical work in evolutionary biology, and at the same time evolutionary biology can use the systems biology approach to go in new uncharted directions. In this study, we present a review of how the post-genomics era is adopting comparative approaches and dynamic system methods to understand the underlying design principles of network evolution and to shape the nascent field of evolutionary systems biology. Finally, the application of evolutionary systems biology to robust biological network designs is also discussed from the synthetic biology perspective. PMID:19468310
Baumbach, Jan; Wittkop, Tobias; Rademacher, Katrin; Rahmann, Sven; Brinkrolf, Karina; Tauch, Andreas
2007-04-30
CoryneRegNet is an ontology-based data warehouse for the reconstruction and visualization of transcriptional regulatory interactions in prokaryotes. To extend the biological content of CoryneRegNet, we added comprehensive data on transcriptional regulations in the model organism Escherichia coli K-12, originally deposited in the international reference database RegulonDB. The enhanced web interface of CoryneRegNet offers several types of search options. The results of a search are displayed in a table-based style and include a visualization of the genetic organization of the respective gene region. Information on DNA binding sites of transcriptional regulators is depicted by sequence logos. The results can also be displayed by several layouters implemented in the graphical user interface GraphVis, allowing, for instance, the visualization of genome-wide network reconstructions and the homology-based inter-species comparison of reconstructed gene regulatory networks. In an application example, we compare the composition of the gene regulatory networks involved in the SOS response of E. coli and Corynebacterium glutamicum. CoryneRegNet is available at the following URL: http://www.cebitec.uni-bielefeld.de/groups/gi/software/coryneregnet/.
An integrated workflow for analysis of ChIP-chip data.
Weigelt, Karin; Moehle, Christoph; Stempfl, Thomas; Weber, Bernhard; Langmann, Thomas
2008-08-01
Although ChIP-chip is a powerful tool for genome-wide discovery of transcription factor target genes, the steps involving raw data analysis, identification of promoters, and correlation with binding sites are still laborious processes. Therefore, we report an integrated workflow for the analysis of promoter tiling arrays with the Genomatix ChipInspector system. We compare this tool with open-source software packages to identify PU.1 regulated genes in mouse macrophages. Our results suggest that ChipInspector data analysis, comparative genomics for binding site prediction, and pathway/network modeling significantly facilitate and enhance whole-genome promoter profiling to reveal in vivo sites of transcription factor-DNA interactions.
Beaudet, Denis; Terrat, Yves; Halary, Sébastien; de la Providencia, Ivan Enrique; Hijri, Mohamed
2013-01-01
Comparative mitochondrial genomics of arbuscular mycorrhizal fungi (AMF) provide new avenues to overcome long-lasting obstacles that have hampered studies aimed at understanding the community structure, diversity, and evolution of these multinucleated and genetically polymorphic organisms.AMF mitochondrial (mt) genomes are homogeneous within isolates, and their intergenic regions harbor numerous mobile elements that have rapidly diverged, including homing endonuclease genes, small inverted repeats, and plasmid-related DNA polymerase genes (dpo), making them suitable targets for the development of reliable strain-specific markers. However, these elements may also lead to genome rearrangements through homologous recombination, although this has never previously been reported in this group of obligate symbiotic fungi. To investigate whether such rearrangements are present and caused by mobile elements in AMF, the mitochondrial genomes from two Glomeraceae members (i.e., Glomus cerebriforme and Glomus sp.) with substantial mtDNA synteny divergence,were sequenced and compared with available glomeromycotan mitochondrial genomes. We used an extensive nucleotide/protein similarity network-based approach to investigated podiversity in AMF as well as in other organisms for which sequences are publicly available. We provide strong evidence of dpo-induced inter-haplotype recombination, leading to a reshuffled mitochondrial genome in Glomus sp. These findings raise questions as to whether AMF single spore cultivations artificially underestimate mtDNA genetic diversity.We assessed potential dpo dispersal mechanisms in AMF and inferred a robust phylogenetic relationship with plant mitochondrial plasmids. Along with other indirect evidence, our analyses indicate that members of the Glomeromycota phylum are potential donors of mitochondrial plasmids to plants.
Beaudet, Denis; Terrat, Yves; Halary, Sébastien; de la Providencia, Ivan Enrique; Hijri, Mohamed
2013-01-01
Comparative mitochondrial genomics of arbuscular mycorrhizal fungi (AMF) provide new avenues to overcome long-lasting obstacles that have hampered studies aimed at understanding the community structure, diversity, and evolution of these multinucleated and genetically polymorphic organisms. AMF mitochondrial (mt) genomes are homogeneous within isolates, and their intergenic regions harbor numerous mobile elements that have rapidly diverged, including homing endonuclease genes, small inverted repeats, and plasmid-related DNA polymerase genes (dpo), making them suitable targets for the development of reliable strain-specific markers. However, these elements may also lead to genome rearrangements through homologous recombination, although this has never previously been reported in this group of obligate symbiotic fungi. To investigate whether such rearrangements are present and caused by mobile elements in AMF, the mitochondrial genomes from two Glomeraceae members (i.e., Glomus cerebriforme and Glomus sp.) with substantial mtDNA synteny divergence, were sequenced and compared with available glomeromycotan mitochondrial genomes. We used an extensive nucleotide/protein similarity network-based approach to investigate dpo diversity in AMF as well as in other organisms for which sequences are publicly available. We provide strong evidence of dpo-induced inter-haplotype recombination, leading to a reshuffled mitochondrial genome in Glomus sp. These findings raise questions as to whether AMF single spore cultivations artificially underestimate mtDNA genetic diversity. We assessed potential dpo dispersal mechanisms in AMF and inferred a robust phylogenetic relationship with plant mitochondrial plasmids. Along with other indirect evidence, our analyses indicate that members of the Glomeromycota phylum are potential donors of mitochondrial plasmids to plants. PMID:23925788
2014-01-01
Background Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism. Results S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug’s transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions – exposure time and concentration and (ii) Network training conditions – training compendium modifications. Two analyses of SSEM-Lasso output – gene set and single gene – were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets. Conclusions This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved. PMID:24444313
HEDD: Human Enhancer Disease Database
Wang, Zhen; Zhang, Quanwei; Zhang, Wen; Lin, Jhih-Rong; Cai, Ying; Mitra, Joydeep
2018-01-01
Abstract Enhancers, as specialized genomic cis-regulatory elements, activate transcription of their target genes and play an important role in pathogenesis of many human complex diseases. Despite recent systematic identification of them in the human genome, currently there is an urgent need for comprehensive annotation databases of human enhancers with a focus on their disease connections. In response, we built the Human Enhancer Disease Database (HEDD) to facilitate studies of enhancers and their potential roles in human complex diseases. HEDD currently provides comprehensive genomic information for ∼2.8 million human enhancers identified by ENCODE, FANTOM5 and RoadMap with disease association scores based on enhancer–gene and gene–disease connections. It also provides Web-based analytical tools to visualize enhancer networks and score enhancers given a set of selected genes in a specific gene network. HEDD is freely accessible at http://zdzlab.einstein.yu.edu/1/hedd.php. PMID:29077884
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models.
Haraldsdóttir, Hulda S; Cousins, Ben; Thiele, Ines; Fleming, Ronan M T; Vempala, Santosh
2017-06-01
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. We apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks. https://github.com/opencobra/cobratoolbox . ronan.mt.fleming@gmail.com or vempala@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Kar, Siddhartha P; Tyrer, Jonathan P; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S; Cramer, Daniel; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K; Kelemen, Linda E; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Iain A; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B; Narod, Steven A; Nedergaard, Lotte; Ness, Roberta B; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schildkraut, Joellen M; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Sucheston-Campbell, Lara E; Tangen, Ingvild L; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S; van Altena, Anne M; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A; Monteiro, Alvaro N A; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P
2015-10-01
Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by coexpression may also be enriched for additional EOC risk associations. We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly coexpressed with each selected TF gene in the unified microarray dataset of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this dataset were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P < 0.05 and FDR < 0.05). These results were replicated (P < 0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Network analysis integrating large, context-specific datasets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. ©2015 American Association for Cancer Research.
Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.
2015-01-01
Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509
Ho, Hsiang; Milenković, Tijana; Memisević, Vesna; Aruri, Jayavani; Przulj, Natasa; Ganesan, Anand K
2010-06-15
RNA-mediated interference (RNAi)-based functional genomics is a systems-level approach to identify novel genes that control biological phenotypes. Existing computational approaches can identify individual genes from RNAi datasets that regulate a given biological process. However, currently available methods cannot identify which RNAi screen "hits" are novel components of well-characterized biological pathways known to regulate the interrogated phenotype. In this study, we describe a method to identify genes from RNAi datasets that are novel components of known biological pathways. We experimentally validate our approach in the context of a recently completed RNAi screen to identify novel regulators of melanogenesis. In this study, we utilize a PPI network topology-based approach to identify targets within our RNAi dataset that may be components of known melanogenesis regulatory pathways. Our computational approach identifies a set of screen targets that cluster topologically in a human PPI network with the known pigment regulator Endothelin receptor type B (EDNRB). Validation studies reveal that these genes impact pigment production and EDNRB signaling in pigmented melanoma cells (MNT-1) and normal melanocytes. We present an approach that identifies novel components of well-characterized biological pathways from functional genomics datasets that could not have been identified by existing statistical and computational approaches.
2010-01-01
Background RNA-mediated interference (RNAi)-based functional genomics is a systems-level approach to identify novel genes that control biological phenotypes. Existing computational approaches can identify individual genes from RNAi datasets that regulate a given biological process. However, currently available methods cannot identify which RNAi screen "hits" are novel components of well-characterized biological pathways known to regulate the interrogated phenotype. In this study, we describe a method to identify genes from RNAi datasets that are novel components of known biological pathways. We experimentally validate our approach in the context of a recently completed RNAi screen to identify novel regulators of melanogenesis. Results In this study, we utilize a PPI network topology-based approach to identify targets within our RNAi dataset that may be components of known melanogenesis regulatory pathways. Our computational approach identifies a set of screen targets that cluster topologically in a human PPI network with the known pigment regulator Endothelin receptor type B (EDNRB). Validation studies reveal that these genes impact pigment production and EDNRB signaling in pigmented melanoma cells (MNT-1) and normal melanocytes. Conclusions We present an approach that identifies novel components of well-characterized biological pathways from functional genomics datasets that could not have been identified by existing statistical and computational approaches. PMID:20550706
Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation
Li, Wenyuan; Liu, Chun-Chi; Zhang, Tong; Li, Haifeng; Waterman, Michael S.; Zhou, Xianghong Jasmine
2011-01-01
The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks. PMID:21698123
Insights into the Ecology and Evolution of Polyploid Plants through Network Analysis.
Gallagher, Joseph P; Grover, Corrinne E; Hu, Guanjing; Wendel, Jonathan F
2016-06-01
Polyploidy is a widespread phenomenon throughout eukaryotes, with important ecological and evolutionary consequences. Although genes operate as components of complex pathways and networks, polyploid changes in genes and gene expression have typically been evaluated as either individual genes or as a part of broad-scale analyses. Network analysis has been fruitful in associating genomic and other 'omic'-based changes with phenotype for many systems. In polyploid species, network analysis has the potential not only to facilitate a better understanding of the complex 'omic' underpinnings of phenotypic and ecological traits common to polyploidy, but also to provide novel insight into the interaction among duplicated genes and genomes. This adds perspective to the global patterns of expression (and other 'omic') change that accompany polyploidy and to the patterns of recruitment and/or loss of genes following polyploidization. While network analysis in polyploid species faces challenges common to other analyses of duplicated genomes, present technologies combined with thoughtful experimental design provide a powerful system to explore polyploid evolution. Here, we demonstrate the utility and potential of network analysis to questions pertaining to polyploidy with an example involving evolution of the transgressively superior cotton fibres found in polyploid Gossypium hirsutum. By combining network analysis with prior knowledge, we provide further insights into the role of profilins in fibre domestication and exemplify the potential for network analysis in polyploid species. © 2016 John Wiley & Sons Ltd.
Suh, Sooyeon; Kim, Hosung; Dang-Vu, Thien Thanh; Joo, Eunyeon; Shin, Chol
2016-01-01
Recent studies have suggested that structural abnormalities in insomnia may be linked with alterations in the default-mode network (DMN). This study compared cortical thickness and structural connectivity linked to the DMN in patients with persistent insomnia (PI) and good sleepers (GS). The current study used a clinical subsample from the longitudinal community-based Korean Genome and Epidemiology Study (KoGES). Cortical thickness and structural connectivity linked to the DMN in patients with persistent insomnia symptoms (PIS; n = 57) were compared to good sleepers (GS; n = 40). All participants underwent MRI acquisition. Based on literature review, we selected cortical regions corresponding to the DMN. A seed-based structural covariance analysis measured cortical thickness correlation between each seed region of the DMN and other cortical areas. Association of cortical thickness and covariance with sleep quality and neuropsychological assessments were further assessed. Compared to GS, cortical thinning was found in PIS in the anterior cingulate cortex, precentral cortex, and lateral prefrontal cortex. Decreased structural connectivity between anterior and posterior regions of the DMN was observed in the PIS group. Decreased structural covariance within the DMN was associated with higher PSQI scores. Cortical thinning in the lateral frontal lobe was related to poor performance in executive function in PIS. Disrupted structural covariance network in PIS might reflect malfunctioning of antero-posterior disconnection of the DMN during the wake to sleep transition that is commonly found during normal sleep. The observed structural network alteration may further implicate commonly observed sustained sleep difficulties and cognitive impairment in insomnia. © 2016 Associated Professional Sleep Societies, LLC.
Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks
Kaltenbacher, Barbara; Hasenauer, Jan
2017-01-01
Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics. PMID:28114351
Bueno, Anibal; Rodríguez-López, Rocío; Reyes-Palomares, Armando; Rojano, Elena; Corpas, Manuel; Nevado, Julián; Lapunzina, Pablo; Sánchez-Jiménez, Francisca; Ranea, Juan A G
2018-06-26
Copy number variations (CNVs) are genomic structural variations (deletions, duplications, or translocations) that represent the 4.8-9.5% of human genome variation in healthy individuals. In some cases, CNVs can also lead to disease, being the etiology of many known rare genetic/genomic disorders. Despite the last advances in genomic sequencing and diagnosis, the pathological effects of many rare genetic variations remain unresolved, largely due to the low number of patients available for these cases, making it difficult to identify consistent patterns of genotype-phenotype relationships. We aimed to improve the identification of statistically consistent genotype-phenotype relationships by integrating all the genetic and clinical data of thousands of patients with rare genomic disorders (obtained from the DECIPHER database) into a phenotype-patient-genotype tripartite network. Then we assessed how our network approach could help in the characterization and diagnosis of novel cases in clinical genetics. The systematic approach implemented in this work is able to better define the relationships between phenotypes and specific loci, by exploiting large-scale association networks of phenotypes and genotypes in thousands of rare disease patients. The application of the described methodology facilitated the diagnosis of novel clinical cases, ranking phenotypes by locus specificity and reporting putative new clinical features that may suggest additional clinical follow-ups. In this work, the proof of concept developed over a set of novel clinical cases demonstrates that this network-based methodology might help improve the precision of patient clinical records and the characterization of rare syndromes.
Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Na, Seungjin; Payne, Samuel H.; Bandeira, Nuno
The spectral networks approach enables the detection of pairs of spectra from related peptides and thus allows for the propagation of annotations from identified peptides to unidentified spectra. Beyond allowing for unbiased discovery of unexpected post-translational modifications, spectral networks are also applicable to multi-species comparative proteomics or metaproteomics to identify numerous orthologous versions of a protein. We present algorithmic and statistical advances in spectral networks that have made it possible to rigorously assess the statistical significance of spectral pairs and accurately estimate the error rate of identifications via propagation. In the analysis of three related Cyanothece species, a model organismmore » for biohydrogen production, spectral networks identified peptides with highly divergent sequences with up to dozens of variants per peptide, including many novel peptides in species that lack a sequenced genome. Furthermore, spectral networks strongly suggested the presence of novel peptides even in genomically characterized species (i.e. missing from databases) in that a significant portion of unidentified multi-species networks included at least two polymorphic peptide variants.« less
The EMBL nucleotide sequence database
Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann
2001-01-01
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039
Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall.
The evolution of metabolic networks of E. coli
2011-01-01
Background Despite the availability of numerous complete genome sequences from E. coli strains, published genome-scale metabolic models exist only for two commensal E. coli strains. These models have proven useful for many applications, such as engineering strains for desired product formation, and we sought to explore how constructing and evaluating additional metabolic models for E. coli strains could enhance these efforts. Results We used the genomic information from 16 E. coli strains to generate an E. coli pangenome metabolic network by evaluating their collective 76,990 ORFs. Each of these ORFs was assigned to one of 17,647 ortholog groups including ORFs associated with reactions in the most recent metabolic model for E. coli K-12. For orthologous groups that contain an ORF already represented in the MG1655 model, the gene to protein to reaction associations represented in this model could then be easily propagated to other E. coli strain models. All remaining orthologous groups were evaluated to see if new metabolic reactions could be added to generate a pangenome-scale metabolic model (iEco1712_pan). The pangenome model included reactions from a metabolic model update for E. coli K-12 MG1655 (iEco1339_MG1655) and enabled development of five additional strain-specific genome-scale metabolic models. These additional models include a second K-12 strain (iEco1335_W3110) and four pathogenic strains (two enterohemorrhagic E. coli O157:H7 and two uropathogens). When compared to the E. coli K-12 models, the metabolic models for the enterohemorrhagic (iEco1344_EDL933 and iEco1345_Sakai) and uropathogenic strains (iEco1288_CFT073 and iEco1301_UTI89) contained numerous lineage-specific gene and reaction differences. All six E. coli models were evaluated by comparing model predictions to carbon source utilization measurements under aerobic and anaerobic conditions, and to batch growth profiles in minimal media with 0.2% (w/v) glucose. An ancestral genome-scale metabolic model based on conserved ortholog groups in all 16 E. coli genomes was also constructed, reflecting the conserved ancestral core of E. coli metabolism (iEco1053_core). Comparative analysis of all six strain-specific E. coli models revealed that some of the pathogenic E. coli strains possess reactions in their metabolic networks enabling higher biomass yields on glucose. Finally the lineage-specific metabolic traits were compared to the ancestral core model predictions to derive new insight into the evolution of metabolism within this species. Conclusion Our findings demonstrate that a pangenome-scale metabolic model can be used to rapidly construct additional E. coli strain-specific models, and that quantitative models of different strains of E. coli can accurately predict strain-specific phenotypes. Such pangenome and strain-specific models can be further used to engineer metabolic phenotypes of interest, such as designing new industrial E. coli strains. PMID:22044664
Context-based retrieval of functional modules in protein-protein interaction networks.
Dobay, Maria Pamela; Stertz, Silke; Delorenzi, Mauro
2017-03-27
Various techniques have been developed for identifying the most probable interactants of a protein under a given biological context. In this article, we dissect the effects of the choice of the protein-protein interaction network (PPI) and the manipulation of PPI settings on the network neighborhood of the influenza A virus (IAV) network, as well as hits in genome-wide small interfering RNA screen results for IAV host factors. We investigate the potential of context filtering, which uses text mining evidence linked to PPI edges, as a complement to the edge confidence scores typically provided in PPIs for filtering, for obtaining more biologically relevant network neighborhoods. Here, we estimate the maximum performance of context filtering to isolate a Kyoto Encyclopedia of Genes and Genomes (KEGG) network Ki from a union of KEGG networks and its network neighborhood. The work gives insights on the use of human PPIs in network neighborhood approaches for functional inference. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
The Reference Genome of the Halophytic Plant Eutrema salsugineum
Yang, Ruolin; Jarvis, David E.; Chen, Hao; Beilstein, Mark A.; Grimwood, Jane; Jenkins, Jerry; Shu, ShengQiang; Prochnik, Simon; Xin, Mingming; Ma, Chuang; Schmutz, Jeremy; Wing, Rod A.; Mitchell-Olds, Thomas; Schumaker, Karen S.; Wang, Xiangfeng
2013-01-01
Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241 Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species. PMID:23518688
Shanley, Thomas P; Cvijanovich, Natalie; Lin, Richard; Allen, Geoffrey L; Thomas, Neal J; Doctor, Allan; Kalyanaraman, Meena; Tofil, Nancy M; Penfil, Scott; Monaco, Marie; Odoms, Kelli; Barnes, Michael; Sakthivel, Bhuvaneswari; Aronow, Bruce J; Wong, Hector R
2007-01-01
We have conducted longitudinal studies focused on the expression profiles of signaling pathways and gene networks in children with septic shock. Genome-level expression profiles were generated from whole blood-derived RNA of children with septic shock (n = 30) corresponding to day one and day three of septic shock, respectively. Based on sequential statistical and expression filters, day one and day three of septic shock were characterized by differential regulation of 2,142 and 2,504 gene probes, respectively, relative to controls (n = 15). Venn analysis demonstrated 239 unique genes in the day one dataset, 598 unique genes in the day three dataset, and 1,906 genes common to both datasets. Functional analyses demonstrated time-dependent, differential regulation of genes involved in multiple signaling pathways and gene networks primarily related to immunity and inflammation. Notably, multiple and distinct gene networks involving T cell- and MHC antigen-related biology were persistently downregulated on both day one and day three. Further analyses demonstrated large scale, persistent downregulation of genes corresponding to functional annotations related to zinc homeostasis. These data represent the largest reported cohort of patients with septic shock subjected to longitudinal genome-level expression profiling. The data further advance our genome-level understanding of pediatric septic shock and support novel hypotheses. PMID:17932561
Lalor, Maeve K; Casali, Nicola; Walker, Timothy M; Anderson, Laura F; Davidson, Jennifer A; Ratna, Natasha; Mullarkey, Cathy; Gent, Mike; Foster, Kirsty; Brown, Tim; Magee, John; Barrett, Anne; Crook, Derrick W; Drobniewski, Francis; Thomas, H Lucy; Abubakar, Ibrahim
2018-06-01
We used whole-genome sequencing (WGS) to delineate transmission networks and investigate the benefits of WGS during cluster investigation.We included clustered cases of multidrug-resistant (MDR) tuberculosis (TB)/extensively drug-resistant (XDR) TB linked by mycobacterial interspersed repetitive unit variable tandem repeat (MIRU-VNTR) strain typing or epidemiological information in the national cluster B1006, notified between 2007 and 2013 in the UK. We excluded from further investigation cases whose isolates differed by greater than 12 single nucleotide polymorphisms (SNPs). Data relating to patients' social networks were collected.27 cases were investigated and 22 had WGS, eight of which (36%) were excluded as their isolates differed by more than 12 SNPs to other cases. 18 cases were ruled into the transmission network based on genomic and epidemiological information. Evidence of transmission was inconclusive in seven out of 18 cases (39%) in the transmission network following WGS and epidemiological investigation.This investigation of a drug-resistant TB cluster illustrates the opportunities and limitations of WGS in understanding transmission in a setting with a high proportion of migrant cases. The use of WGS should be combined with classical epidemiological methods. However, not every cluster will be solvable, regardless of the quality of genomic data. Copyright ©ERS 2018.
Net Venn - An integrated network analysis web platform for gene lists
USDA-ARS?s Scientific Manuscript database
Many lists containing biological identifiers such as gene lists have been generated in various genomics projects. Identifying the overlap among gene lists can enable us to understand the similarities and differences between the datasets. Here, we present an interactome network-based web application...
Mandal, Sudip; Saha, Goutam; Pal, Rajat Kumar
2017-08-01
Correct inference of genetic regulations inside a cell from the biological database like time series microarray data is one of the greatest challenges in post genomic era for biologists and researchers. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. Inspired by the behavior of social elephants, we propose a new metaheuristic namely Elephant Swarm Water Search Algorithm (ESWSA) to infer Gene Regulatory Network (GRN). This algorithm is mainly based on the water search strategy of intelligent and social elephants during drought, utilizing the different types of communication techniques. Initially, the algorithm is tested against benchmark small and medium scale artificial genetic networks without and with presence of different noise levels and the efficiency was observed in term of parametric error, minimum fitness value, execution time, accuracy of prediction of true regulation, etc. Next, the proposed algorithm is tested against the real time gene expression data of Escherichia Coli SOS Network and results were also compared with others state of the art optimization methods. The experimental results suggest that ESWSA is very efficient for GRN inference problem and performs better than other methods in many ways.
O'Brien, M.A.; Costin, B.N.; Miles, M.F.
2014-01-01
Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313
Eguchi, Asuka; Lee, Garrett O.; Wan, Fang; Erwin, Graham S.; Ansari, Aseem Z.
2014-01-01
Transcription factors control the fate of a cell by regulating the expression of genes and regulatory networks. Recent successes in inducing pluripotency in terminally differentiated cells as well as directing differentiation with natural transcription factors has lent credence to the efforts that aim to direct cell fate with rationally designed transcription factors. Because DNA-binding factors are modular in design, they can be engineered to target specific genomic sequences and perform pre-programmed regulatory functions upon binding. Such precision-tailored factors can serve as molecular tools to reprogramme or differentiate cells in a targeted manner. Using different types of engineered DNA binders, both regulatory transcriptional controls of gene networks, as well as permanent alteration of genomic content, can be implemented to study cell fate decisions. In the present review, we describe the current state of the art in artificial transcription factor design and the exciting prospect of employing artificial DNA-binding factors to manipulate the transcriptional networks as well as epigenetic landscapes that govern cell fate. PMID:25145439
Large Scale Comparative Visualisation of Regulatory Networks with TRNDiff
Chua, Xin-Yi; Buckingham, Lawrence; Hogan, James M.; ...
2015-06-01
The advent of Next Generation Sequencing (NGS) technologies has seen explosive growth in genomic datasets, and dense coverage of related organisms, supporting study of subtle, strain-specific variations as a determinant of function. Such data collections present fresh and complex challenges for bioinformatics, those of comparing models of complex relationships across hundreds and even thousands of sequences. Transcriptional Regulatory Network (TRN) structures document the influence of regulatory proteins called Transcription Factors (TFs) on associated Target Genes (TGs). TRNs are routinely inferred from model systems or iterative search, and analysis at these scales requires simultaneous displays of multiple networks well beyond thosemore » of existing network visualisation tools [1]. In this paper we describe TRNDiff, an open source system supporting the comparative analysis and visualization of TRNs (and similarly structured data) from many genomes, allowing rapid identification of functional variations within species. The approach is demonstrated through a small scale multiple TRN analysis of the Fur iron-uptake system of Yersinia, suggesting a number of candidate virulence factors; and through a larger study exploiting integration with the RegPrecise database (http://regprecise.lbl.gov; [2]) - a collection of hundreds of manually curated and predicted transcription factor regulons drawn from across the entire spectrum of prokaryotic organisms.« less
A negative genetic interaction map in isogenic cancer cell lines reveals cancer cell vulnerabilities
Vizeacoumar, Franco J; Arnold, Roland; Vizeacoumar, Frederick S; Chandrashekhar, Megha; Buzina, Alla; Young, Jordan T F; Kwan, Julian H M; Sayad, Azin; Mero, Patricia; Lawo, Steffen; Tanaka, Hiromasa; Brown, Kevin R; Baryshnikova, Anastasia; Mak, Anthony B; Fedyshyn, Yaroslav; Wang, Yadong; Brito, Glauber C; Kasimer, Dahlia; Makhnevych, Taras; Ketela, Troy; Datti, Alessandro; Babu, Mohan; Emili, Andrew; Pelletier, Laurence; Wrana, Jeff; Wainberg, Zev; Kim, Philip M; Rottapel, Robert; O'Brien, Catherine A; Andrews, Brenda; Boone, Charles; Moffat, Jason
2013-01-01
Improved efforts are necessary to define the functional product of cancer mutations currently being revealed through large-scale sequencing efforts. Using genome-scale pooled shRNA screening technology, we mapped negative genetic interactions across a set of isogenic cancer cell lines and confirmed hundreds of these interactions in orthogonal co-culture competition assays to generate a high-confidence genetic interaction network of differentially essential or differential essentiality (DiE) genes. The network uncovered examples of conserved genetic interactions, densely connected functional modules derived from comparative genomics with model systems data, functions for uncharacterized genes in the human genome and targetable vulnerabilities. Finally, we demonstrate a general applicability of DiE gene signatures in determining genetic dependencies of other non-isogenic cancer cell lines. For example, the PTEN−/− DiE genes reveal a signature that can preferentially classify PTEN-dependent genotypes across a series of non-isogenic cell lines derived from the breast, pancreas and ovarian cancers. Our reference network suggests that many cancer vulnerabilities remain to be discovered through systematic derivation of a network of differentially essential genes in an isogenic cancer cell model. PMID:24104479
Yao, Yao; Marchal, Kathleen; Van de Peer, Yves
2014-01-01
One of the important challenges in the field of evolutionary robotics is the development of systems that can adapt to a changing environment. However, the ability to adapt to unknown and fluctuating environments is not straightforward. Here, we explore the adaptive potential of simulated swarm robots that contain a genomic encoding of a bio-inspired gene regulatory network (GRN). An artificial genome is combined with a flexible agent-based system, representing the activated part of the regulatory network that transduces environmental cues into phenotypic behaviour. Using an artificial life simulation framework that mimics a dynamically changing environment, we show that separating the static from the conditionally active part of the network contributes to a better adaptive behaviour. Furthermore, in contrast with most hitherto developed ANN-based systems that need to re-optimize their complete controller network from scratch each time they are subjected to novel conditions, our system uses its genome to store GRNs whose performance was optimized under a particular environmental condition for a sufficiently long time. When subjected to a new environment, the previous condition-specific GRN might become inactivated, but remains present. This ability to store ‘good behaviour’ and to disconnect it from the novel rewiring that is essential under a new condition allows faster re-adaptation if any of the previously observed environmental conditions is reencountered. As we show here, applying these evolutionary-based principles leads to accelerated and improved adaptive evolution in a non-stable environment. PMID:24599485
Shin, Junha; Lee, Insuk
2015-01-01
Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes. PMID:26394049
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world
Koonin, Eugene V.; Wolf, Yuri I.
2008-01-01
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution. PMID:18948295
YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.
Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh
2015-01-16
Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the preliminary results showed differences in virulence genes found in Yersinia pestis and Yersinia pseudotuberculosis compared to other Yersinia species, and differences between Yersinia enterocolitica subsp. enterocolitica and Yersinia enterocolitica subsp. palearctica. YersiniaBase offers free access to wide range of genomic data and analysis tools for the analysis of Yersinia. YersiniaBase can be accessed at http://yersinia.um.edu.my .
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks
Juan, David; Pazos, Florencio; Valencia, Alfonso
2008-01-01
Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology. PMID:18199838
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks.
Juan, David; Pazos, Florencio; Valencia, Alfonso
2008-01-22
Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology.
PGTandMe: social networking-based genetic testing and the evolving research model.
Koch, Valerie Gutmann
2012-01-01
The opportunity to use extensive genetic data, personal information, and family medical history for research purposes may be naturally appealing to the personal genetic testing (PGT) industry, which is already coupling direct-to-consumer (DTC) products with social networking technologies, as well as to potential industry or institutional partners. This article evaluates the transformation in research that the hybrid of PGT and social networking will bring about, and--highlighting the challenges associated with a new paradigm of "patient-driven" genomic research--focuses on the consequences of shifting the structure, locus, timing, and scope of research through genetic crowd-sourcing. This article also explores potential ethical, legal, and regulatory issues that arise from the hybrid between personal genomic research and online social networking, particularly regarding informed consent, institutional review board (IRB) oversight, and ownership/intellectual property (IP) considerations.
Comparative Phylogenomics Uncovers the Impact of Symbiotic Associations on Host Genome Evolution
Delaux, Pierre-Marc; Varala, Kranthi; Edger, Patrick P.; Coruzzi, Gloria M.; Pires, J. Chris; Ané, Jean-Michel
2014-01-01
Mutualistic symbioses between eukaryotes and beneficial microorganisms of their microbiome play an essential role in nutrition, protection against disease, and development of the host. However, the impact of beneficial symbionts on the evolution of host genomes remains poorly characterized. Here we used the independent loss of the most widespread plant–microbe symbiosis, arbuscular mycorrhization (AM), as a model to address this question. Using a large phenotypic approach and phylogenetic analyses, we present evidence that loss of AM symbiosis correlates with the loss of many symbiotic genes in the Arabidopsis lineage (Brassicales). Then, by analyzing the genome and/or transcriptomes of nine other phylogenetically divergent non-host plants, we show that this correlation occurred in a convergent manner in four additional plant lineages, demonstrating the existence of an evolutionary pattern specific to symbiotic genes. Finally, we use a global comparative phylogenomic approach to track this evolutionary pattern among land plants. Based on this approach, we identify a set of 174 highly conserved genes and demonstrate enrichment in symbiosis-related genes. Our findings are consistent with the hypothesis that beneficial symbionts maintain purifying selection on host gene networks during the evolution of entire lineages. PMID:25032823
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weighill, Deborah; Jones, Piet; Shah, Manesh
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance.« less
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
Weighill, Deborah; Jones, Piet; Shah, Manesh; ...
2018-05-11
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance.« less
Comparative multi-omics systems analysis of Escherichia coli strains B and K-12.
Yoon, Sung Ho; Han, Mee-Jung; Jeong, Haeyoung; Lee, Choong Hoon; Xia, Xiao-Xia; Lee, Dae-Hee; Shim, Ji Hoon; Lee, Sang Yup; Oh, Tae Kwang; Kim, Jihyun F
2012-05-25
Elucidation of a genotype-phenotype relationship is critical to understand an organism at the whole-system level. Here, we demonstrate that comparative analyses of multi-omics data combined with a computational modeling approach provide a framework for elucidating the phenotypic characteristics of organisms whose genomes are sequenced. We present a comprehensive analysis of genome-wide measurements incorporating multifaceted holistic data - genome, transcriptome, proteome, and phenome - to determine the differences between Escherichia coli B and K-12 strains. A genome-scale metabolic network of E. coli B was reconstructed and used to identify genetic bases of the phenotypes unique to B compared with K-12 through in silico complementation testing. This systems analysis revealed that E. coli B is well-suited for production of recombinant proteins due to a greater capacity for amino acid biosynthesis, fewer proteases, and lack of flagella. Furthermore, E. coli B has an additional type II secretion system and a different cell wall and outer membrane composition predicted to be more favorable for protein secretion. In contrast, E. coli K-12 showed a higher expression of heat shock genes and was less susceptible to certain stress conditions. This integrative systems approach provides a high-resolution system-wide view and insights into why two closely related strains of E. coli, B and K-12, manifest distinct phenotypes. Therefore, systematic understanding of cellular physiology and metabolism of the strains is essential not only to determine culture conditions but also to design recombinant hosts.
Comparative multi-omics systems analysis of Escherichia coli strains B and K-12
2012-01-01
Background Elucidation of a genotype-phenotype relationship is critical to understand an organism at the whole-system level. Here, we demonstrate that comparative analyses of multi-omics data combined with a computational modeling approach provide a framework for elucidating the phenotypic characteristics of organisms whose genomes are sequenced. Results We present a comprehensive analysis of genome-wide measurements incorporating multifaceted holistic data - genome, transcriptome, proteome, and phenome - to determine the differences between Escherichia coli B and K-12 strains. A genome-scale metabolic network of E. coli B was reconstructed and used to identify genetic bases of the phenotypes unique to B compared with K-12 through in silico complementation testing. This systems analysis revealed that E. coli B is well-suited for production of recombinant proteins due to a greater capacity for amino acid biosynthesis, fewer proteases, and lack of flagella. Furthermore, E. coli B has an additional type II secretion system and a different cell wall and outer membrane composition predicted to be more favorable for protein secretion. In contrast, E. coli K-12 showed a higher expression of heat shock genes and was less susceptible to certain stress conditions. Conclusions This integrative systems approach provides a high-resolution system-wide view and insights into why two closely related strains of E. coli, B and K-12, manifest distinct phenotypes. Therefore, systematic understanding of cellular physiology and metabolism of the strains is essential not only to determine culture conditions but also to design recombinant hosts. PMID:22632713
Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.
Natarajan, Shreedhar; Jakobsson, Eric
2009-06-12
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.
Functional Equivalency Inferred from “Authoritative Sources” in Networks of Homologous Proteins
Natarajan, Shreedhar; Jakobsson, Eric
2009-01-01
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods. PMID:19521530
Metabolic Network Modeling of Microbial Communities
Biggs, Matthew B.; Medlock, Gregory L.; Kolling, Glynis L.
2015-01-01
Genome-scale metabolic network reconstructions and constraint-based analysis are powerful methods that have the potential to make functional predictions about microbial communities. Current use of genome-scale metabolic networks to characterize the metabolic functions of microbial communities includes species compartmentalization, separating species-level and community-level objectives, dynamic analysis, the “enzyme-soup” approach, multi-scale modeling, and others. There are many challenges inherent to the field, including a need for tools that accurately assign high-level omics signals to individual community members, new automated reconstruction methods that rival manual curation, and novel algorithms for integrating omics data and engineering communities. As technologies and modeling frameworks improve, we expect that there will be proportional advances in the fields of ecology, health science, and microbial community engineering. PMID:26109480
The Cancer Genome Atlas Pan-Cancer analysis project.
Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M
2013-10-01
The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.
Metavir 2: new tools for viral metagenome comparison and assembled virome analysis
2014-01-01
Background Metagenomics, based on culture-independent sequencing, is a well-fitted approach to provide insights into the composition, structure and dynamics of environmental viral communities. Following recent advances in sequencing technologies, new challenges arise for existing bioinformatic tools dedicated to viral metagenome (i.e. virome) analysis as (i) the number of viromes is rapidly growing and (ii) large genomic fragments can now be obtained by assembling the huge amount of sequence data generated for each metagenome. Results To face these challenges, a new version of Metavir was developed. First, all Metavir tools have been adapted to support comparative analysis of viromes in order to improve the analysis of multiple datasets. In addition to the sequence comparison previously provided, viromes can now be compared through their k-mer frequencies, their taxonomic compositions, recruitment plots and phylogenetic trees containing sequences from different datasets. Second, a new section has been specifically designed to handle assembled viromes made of thousands of large genomic fragments (i.e. contigs). This section includes an annotation pipeline for uploaded viral contigs (gene prediction, similarity search against reference viral genomes and protein domains) and an extensive comparison between contigs and reference genomes. Contigs and their annotations can be explored on the website through specifically developed dynamic genomic maps and interactive networks. Conclusions The new features of Metavir 2 allow users to explore and analyze viromes composed of raw reads or assembled fragments through a set of adapted tools and a user-friendly interface. PMID:24646187
Dong, Zhanshan; Danilevskaya, Olga; Abadie, Tabare; Messina, Carlos; Coles, Nathan; Cooper, Mark
2012-01-01
The transition from the vegetative to reproductive development is a critical event in the plant life cycle. The accurate prediction of flowering time in elite germplasm is important for decisions in maize breeding programs and best agronomic practices. The understanding of the genetic control of flowering time in maize has significantly advanced in the past decade. Through comparative genomics, mutant analysis, genetic analysis and QTL cloning, and transgenic approaches, more than 30 flowering time candidate genes in maize have been revealed and the relationships among these genes have been partially uncovered. Based on the knowledge of the flowering time candidate genes, a conceptual gene regulatory network model for the genetic control of flowering time in maize is proposed. To demonstrate the potential of the proposed gene regulatory network model, a first attempt was made to develop a dynamic gene network model to predict flowering time of maize genotypes varying for specific genes. The dynamic gene network model is composed of four genes and was built on the basis of gene expression dynamics of the two late flowering id1 and dlf1 mutants, the early flowering landrace Gaspe Flint and the temperate inbred B73. The model was evaluated against the phenotypic data of the id1 dlf1 double mutant and the ZMM4 overexpressed transgenic lines. The model provides a working example that leverages knowledge from model organisms for the utilization of maize genomic information to predict a whole plant trait phenotype, flowering time, of maize genotypes.
Comparative analysis of genomics and proteomics in Bacillus thuringiensis 4.0718.
Rang, Jie; He, Hao; Wang, Ting; Ding, Xuezhi; Zuo, Mingxing; Quan, Meifang; Sun, Yunjun; Yu, Ziquan; Hu, Shengbiao; Xia, Liqiu
2015-01-01
Bacillus thuringiensis is a widely used biopesticide that produced various insecticidal active substances during its life cycle. Separation and purification of numerous insecticide active substances have been difficult because of the relatively short half-life of such substances. On the other hand, substances can be synthetized at different times during development, so samples at different stages have to be studied, further complicating the analysis. A dual genomic and proteomic approach would enhance our ability to identify such substances, and particularily using mass spectrometry-based proteomic methods. The comparative analysis for genomic and proteomic data have showed that not all of the products deduced from the annotated genome could be identified among the proteomic data. For instance, genome annotation results showed that 39 coding sequences in the whole genome were related to insect pathogenicity, including five cry genes. However, Cry2Ab, Cry1Ia, Cytotoxin K, Bacteriocin, Exoenzyme C3 and Alveolysin could not be detected in the proteomic data obtained. The sporulation-related proteins were also compared analysis, results showed that the great majority sporulation-related proteins can be detected by mass spectrometry. This analysis revealed Spo0A~P, SigF, SigE(+), SigK(+) and SigG(+), all known to play an important role in the process of spore formation regulatory network, also were displayed in the proteomic data. Through the comparison of the two data sets, it was possible to infer that some genes were silenced or were expressed at very low levels. For instance, found that cry2Ab seems to lack a functional promoter while cry1Ia may not be expressed due to the presence of transposons. With this comparative study a relatively complete database can be constructed and used to transform hereditary material, thereby prompting the high expression of toxic proteins. A theoretical basis is provided for constructing highly virulent engineered bacteria and for promoting the application of proteogenomics in the life sciences.
Increased entropy of signal transduction in the cancer metastasis phenotype.
Teschendorff, Andrew E; Severini, Simone
2010-07-30
The statistical study of biological networks has led to important novel biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes. Based on the observation that frequent genomic alterations underlie a more aggressive cancer phenotype, we asked if such an effect could be detectable as an increase in the randomness of local gene expression patterns. Using a breast cancer gene expression data set and a model network of protein interactions we derive constrained weighted networks defined by a stochastic information flux matrix reflecting expression correlations between interacting proteins. Based on this stochastic matrix we propose and compute an entropy measure that quantifies the degree of randomness in the local pattern of information flux around single genes. By comparing the local entropies in the non-metastatic versus metastatic breast cancer networks, we here show that breast cancers that metastasize are characterised by a small yet significant increase in the degree of randomness of local expression patterns. We validate this result in three additional breast cancer expression data sets and demonstrate that local entropy better characterises the metastatic phenotype than other non-entropy based measures. We show that increases in entropy can be used to identify genes and signalling pathways implicated in breast cancer metastasis and provide examples of de-novo discoveries of gene modules with known roles in apoptosis, immune-mediated tumour suppression, cell-cycle and tumour invasion. Importantly, we also identify a novel gene module within the insulin growth factor signalling pathway, alteration of which may predispose the tumour to metastasize. These results demonstrate that a metastatic cancer phenotype is characterised by an increase in the randomness of the local information flux patterns. Measures of local randomness in integrated protein interaction mRNA expression networks may therefore be useful for identifying genes and signalling pathways disrupted in one phenotype relative to another. Further exploration of the statistical properties of such integrated cancer expression and protein interaction networks will be a fruitful endeavour.
Ruppin, Eytan; Papin, Jason A; de Figueiredo, Luis F; Schuster, Stefan
2010-08-01
With the advent of modern omics technologies, it has become feasible to reconstruct (quasi-) whole-cell metabolic networks and characterize them in more and more detail. Computer simulations of the dynamic behavior of such networks are difficult due to a lack of kinetic data and to computational limitations. In contrast, network analysis based on appropriate constraints such as the steady-state condition (constraint-based analysis) is feasible and allows one to derive conclusions about the system's metabolic capabilities. Here, we review methods for the reconstruction of metabolic networks, modeling techniques such as flux balance analysis and elementary flux modes and current progress in their development and applications. Game-theoretical methods for studying metabolic networks are discussed as well. Copyright © 2010 Elsevier Ltd. All rights reserved.
Causal gene identification using combinatorial V-structure search.
Cai, Ruichu; Zhang, Zhenjie; Hao, Zhifeng
2013-07-01
With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data. Copyright © 2013 Elsevier Ltd. All rights reserved.
A multi-objective constraint-based approach for modeling genome-scale microbial ecosystems.
Budinich, Marko; Bourdon, Jérémie; Larhlimi, Abdelhalim; Eveillard, Damien
2017-01-01
Interplay within microbial communities impacts ecosystems on several scales, and elucidation of the consequent effects is a difficult task in ecology. In particular, the integration of genome-scale data within quantitative models of microbial ecosystems remains elusive. This study advocates the use of constraint-based modeling to build predictive models from recent high-resolution -omics datasets. Following recent studies that have demonstrated the accuracy of constraint-based models (CBMs) for simulating single-strain metabolic networks, we sought to study microbial ecosystems as a combination of single-strain metabolic networks that exchange nutrients. This study presents two multi-objective extensions of CBMs for modeling communities: multi-objective flux balance analysis (MO-FBA) and multi-objective flux variability analysis (MO-FVA). Both methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. We expect this approach to be used for integrating genomic information in microbial ecosystems. Following models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.
Systems Biology Knowledgebase (GSC8 Meeting)
Cottingham, Robert W.
2018-01-04
The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding Research Coordination Network from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. Robert W. Cottingham of Oak Ridge National Laboratory discusses the DOE Knowledge Base at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, CA on Sept. 9, 2009.
Saxena, Pratik; Heng, Boon Chin; Bai, Peng; Folcher, Marc; Zulewski, Henryk; Fussenegger, Martin
2016-01-01
Synthetic biology has advanced the design of standardized transcription control devices that programme cellular behaviour. By coupling synthetic signalling cascade- and transcription factor-based gene switches with reverse and differential sensitivity to the licensed food additive vanillic acid, we designed a synthetic lineage-control network combining vanillic acid-triggered mutually exclusive expression switches for the transcription factors Ngn3 (neurogenin 3; OFF-ON-OFF) and Pdx1 (pancreatic and duodenal homeobox 1; ON-OFF-ON) with the concomitant induction of MafA (V-maf musculoaponeurotic fibrosarcoma oncogene homologue A; OFF-ON). This designer network consisting of different network topologies orchestrating the timely control of transgenic and genomic Ngn3, Pdx1 and MafA variants is able to programme human induced pluripotent stem cells (hIPSCs)-derived pancreatic progenitor cells into glucose-sensitive insulin-secreting beta-like cells, whose glucose-stimulated insulin-release dynamics are comparable to human pancreatic islets. Synthetic lineage-control networks may provide the missing link to genetically programme somatic cells into autologous cell phenotypes for regenerative medicine. PMID:27063289
Inouye, Michael; Ripatti, Samuli; Kettunen, Johannes; Lyytikäinen, Leo-Pekka; Oksala, Niku; Laurila, Pirkka-Pekka; Kangas, Antti J.; Soininen, Pasi; Savolainen, Markku J.; Viikari, Jorma; Kähönen, Mika; Perola, Markus; Salomaa, Veikko; Raitakari, Olli; Lehtimäki, Terho; Taskinen, Marja-Riitta; Järvelin, Marjo-Riitta; Ala-Korpela, Mika; Palotie, Aarno; de Bakker, Paul I. W.
2012-01-01
Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis. PMID:22916037
Suo, Chen; Hrydziuszko, Olga; Lee, Donghwan; Pramana, Setia; Saputra, Dhany; Joshi, Himanshu; Calza, Stefano; Pawitan, Yudi
2015-08-15
Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures. An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer. The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/. yudi.pawitan@ki.se Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
D'Arrigo, Stefano; Gavazzi, Francesco; Alfei, Enrico; Zuffardi, Orsetta; Montomoli, Cristina; Corso, Barbara; Buzzi, Erika; Sciacca, Francesca L; Bulgheroni, Sara; Riva, Daria; Pantaleoni, Chiara
2016-05-01
Microarray-based comparative genomic hybridization is a method of molecular analysis that identifies chromosomal anomalies (or copy number variants) that correlate with clinical phenotypes. The aim of the present study was to apply a clinical score previously designated by de Vries to 329 patients with intellectual disability/developmental disorder (intellectual disability/developmental delay) referred to our tertiary center and to see whether the clinical factors are associated with a positive outcome of aCGH analyses. Another goal was to test the association between a positive microarray-based comparative genomic hybridization result and the severity of intellectual disability/developmental delay. Microarray-based comparative genomic hybridization identified structural chromosomal alterations responsible for the intellectual disability/developmental delay phenotype in 16% of our sample. Our study showed that causative copy number variants are frequently found even in cases of mild intellectual disability (30.77%). We want to emphasize the need to conduct microarray-based comparative genomic hybridization on all individuals with intellectual disability/developmental delay, regardless of the severity, because the degree of intellectual disability/developmental delay does not predict the diagnostic yield of microarray-based comparative genomic hybridization. © The Author(s) 2015.
mySyntenyPortal: an application package to construct websites for synteny block analysis.
Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum
2018-06-05
Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.
Dufour, Yann S.; Donohue, Timothy J.
2015-01-01
Transcriptional regulation plays a significant role in the biological response of bacteria to changing environmental conditions. Therefore, mapping transcriptional regulatory networks is an important step not only in understanding how bacteria sense and interpret their environment but also to identify the functions involved in biological responses to specific conditions. Recent experimental and computational developments have facilitated the characterization of regulatory networks on a genome-wide scale in model organisms. In addition, the multiplication of complete genome sequences has encouraged comparative analyses to detect conserved regulatory elements and infer regulatory networks in other less well-studied organisms. However, transcription regulation appears to evolve rapidly, thus, creating challenges for the transfer of knowledge to nonmodel organisms. Nevertheless, the mechanisms and constraints driving the evolution of regulatory networks have been the subjects of numerous analyses, and several models have been proposed. Overall, the contributions of mutations, recombination, and horizontal gene transfer are complex. Finally, the rapid evolution of regulatory networks plays a significant role in the remarkable capacity of bacteria to adapt to new or changing environments. Conversely, the characteristics of environmental niches determine the selective pressures and can shape the structure of regulatory network accordingly. PMID:23046950
Integrative Genomics Reveals Novel Molecular Pathways and Gene Networks for Coronary Artery Disease
Mäkinen, Ville-Petteri; Civelek, Mete; Meng, Qingying; Zhang, Bin; Zhu, Jun; Levian, Candace; Huan, Tianxiao; Segrè, Ayellet V.; Ghosh, Sujoy; Vivar, Juan; Nikpay, Majid; Stewart, Alexandre F. R.; Nelson, Christopher P.; Willenborg, Christina; Erdmann, Jeanette; Blakenberg, Stefan; O'Donnell, Christopher J.; März, Winfried; Laaksonen, Reijo; Epstein, Stephen E.; Kathiresan, Sekar; Shah, Svati H.; Hazen, Stanley L.; Reilly, Muredach P.; Lusis, Aldons J.; Samani, Nilesh J.; Schunkert, Heribert; Quertermous, Thomas; McPherson, Ruth; Yang, Xia; Assimes, Themistocles L.
2014-01-01
The majority of the heritability of coronary artery disease (CAD) remains unexplained, despite recent successes of genome-wide association studies (GWAS) in identifying novel susceptibility loci. Integrating functional genomic data from a variety of sources with a large-scale meta-analysis of CAD GWAS may facilitate the identification of novel biological processes and genes involved in CAD, as well as clarify the causal relationships of established processes. Towards this end, we integrated 14 GWAS from the CARDIoGRAM Consortium and two additional GWAS from the Ottawa Heart Institute (25,491 cases and 66,819 controls) with 1) genetics of gene expression studies of CAD-relevant tissues in humans, 2) metabolic and signaling pathways from public databases, and 3) data-driven, tissue-specific gene networks from a multitude of human and mouse experiments. We not only detected CAD-associated gene networks of lipid metabolism, coagulation, immunity, and additional networks with no clear functional annotation, but also revealed key driver genes for each CAD network based on the topology of the gene regulatory networks. In particular, we found a gene network involved in antigen processing to be strongly associated with CAD. The key driver genes of this network included glyoxalase I (GLO1) and peptidylprolyl isomerase I (PPIL1), which we verified as regulatory by siRNA experiments in human aortic endothelial cells. Our results suggest genetic influences on a diverse set of both known and novel biological processes that contribute to CAD risk. The key driver genes for these networks highlight potential novel targets for further mechanistic studies and therapeutic interventions. PMID:25033284
BIOREL: the benchmark resource to estimate the relevance of the gene networks.
Antonov, Alexey V; Mewes, Hans W
2006-02-06
The progress of high-throughput methodologies in functional genomics has lead to the development of statistical procedures to infer gene networks from various types of high-throughput data. However, due to the lack of common standards, the biological significance of the results of the different studies is hard to compare. To overcome this problem we propose a benchmark procedure and have developed a web resource (BIOREL), which is useful for estimating the biological relevance of any genetic network by integrating different sources of biological information. The associations of each gene from the network are classified as biologically relevant or not. The proportion of genes in the network classified as "relevant" is used as the overall network relevance score. Employing synthetic data we demonstrated that such a score ranks the networks fairly in respect to the relevance level. Using BIOREL as the benchmark resource we compared the quality of experimental and theoretically predicted protein interaction data.
Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.
Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi
2007-10-04
In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
BiologicalNetworks 2.0 - an integrative view of genome biology data
2010-01-01
Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other) and their relations (interactions, co-expression, co-citations, and other). The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org. PMID:21190573
KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.
Wang, Dapeng; Xu, Jiayue; Yu, Jun
2015-09-16
The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.
Cowper-Sal lari, Richard; Cole, Michael D; Karagas, Margaret R; Lupien, Mathieu; Moore, Jason H
2011-01-01
The conceptual foundation of the genome-wide association study (GWAS) has advanced unchecked since its conception. A revision might seem premature as the potential of GWAS has not been fully realized. Multiple technical and practical limitations need to be overcome before GWAS can be fairly criticized. But with the completion of hundreds of studies and a deeper understanding of the genetic architecture of disease, warnings are being raised. The results compiled to date indicate that risk-associated variants lie predominantly in noncoding regions of the genome. Additionally, alternative methodologies are uncovering large and heterogeneous sets of rare variants underlying disease. The fear is that, even in its fulfillment, the current GWAS paradigm might be incapable of dissecting all kinds of phenotypes. In the following text, we review several initiatives that aim to overcome these limitations. The overarching theme of these studies is the inclusion of biological knowledge to both the analysis and interpretation of genotyping data. GWAS is uninformed of biology by design and although there is some virtue in its simplicity, it is also its most conspicuous deficiency. We propose a framework in which to integrate these novel approaches, both empirical and theoretical, in the form of a genome-wide regulatory network (GWRN). By processing experimental data into networks, emerging data types based on chromatin immunoprecipitation are made computationally tractable. This will give GWAS re-analysis efforts the most current and relevant substrates, and root them firmly on our knowledge of human disease. Copyright © 2010 John Wiley & Sons, Inc.
Xiao, Xiaolin; Moreno-Moral, Aida; Rotival, Maxime; Bottolo, Leonardo; Petretto, Enrico
2014-01-01
Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.
Essential RNA-Based Technologies and Their Applications in Plant Functional Genomics.
Teotia, Sachin; Singh, Deepali; Tang, Xiaoqing; Tang, Guiliang
2016-02-01
Genome sequencing has not only extended our understanding of the blueprints of many plant species but has also revealed the secrets of coding and non-coding genes. We present here a brief introduction to and personal account of key RNA-based technologies, as well as their development and applications for functional genomics of plant coding and non-coding genes, with a focus on short tandem target mimics (STTMs), artificial microRNAs (amiRNAs), and CRISPR/Cas9. In addition, their use in multiplex technologies for the functional dissection of gene networks is discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-01
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-11
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
MSD-MAP: A Network-Based Systems Biology Platform for Predicting Disease-Metabolite Links.
Wathieu, Henri; Issa, Naiem T; Mohandoss, Manisha; Byers, Stephen W; Dakshanamurthy, Sivanesan
2017-01-01
Cancer-associated metabolites result from cell-wide mechanisms of dysregulation. The field of metabolomics has sought to identify these aberrant metabolites as disease biomarkers, clues to understanding disease mechanisms, or even as therapeutic agents. This study was undertaken to reliably predict metabolites associated with colorectal, esophageal, and prostate cancers. Metabolite and disease biological action networks were compared in a computational platform called MSD-MAP (Multi Scale Disease-Metabolite Association Platform). Using differential gene expression analysis with patient-based RNAseq data from The Cancer Genome Atlas, genes up- or down-regulated in cancer compared to normal tissue were identified. Relational databases were used to map biological entities including pathways, functions, and interacting proteins, to those differential disease genes. Similar relational maps were built for metabolites, stemming from known and in silico predicted metabolite-protein associations. The hypergeometric test was used to find statistically significant relationships between disease and metabolite biological signatures at each tier, and metabolites were assessed for multi-scale association with each cancer. Metabolite networks were also directly associated with various other diseases using a disease functional perturbation database. Our platform recapitulated metabolite-disease links that have been empirically verified in the scientific literature, with network-based mapping of jointly-associated biological activity also matching known disease mechanisms. This was true for colorectal, esophageal, and prostate cancers, using metabolite action networks stemming from both predicted and known functional protein associations. By employing systems biology concepts, MSD-MAP reliably predicted known cancermetabolite links, and may serve as a predictive tool to streamline conventional metabolomic profiling methodologies. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Hyun-Seob; Goldberg, Noam; Mahajan, Ashutosh
Elementary (flux) modes (EMs) have served as a valuable tool for investigating structural and functional properties of metabolic networks. Identification of the full set of EMs in genome-scale networks remains challenging due to combinatorial explosion of EMs in complex networks. It is often, however, that only a small subset of relevant EMs needs to be known, for which optimization-based sequential computation is a useful alternative. Most of the currently available methods along this line are based on the iterative use of mixed integer linear programming (MILP), the effectiveness of which significantly deteriorates as the number of iterations builds up. Tomore » alleviate the computational burden associated with the MILP implementation, we here present a novel optimization algorithm termed alternate integer linear programming (AILP). Results: Our algorithm was designed to iteratively solve a pair of integer programming (IP) and linear programming (LP) to compute EMs in a sequential manner. In each step, the IP identifies a minimal subset of reactions, the deletion of which disables all previously identified EMs. Thus, a subsequent LP solution subject to this reaction deletion constraint becomes a distinct EM. In cases where no feasible LP solution is available, IP-derived reaction deletion sets represent minimal cut sets (MCSs). Despite the additional computation of MCSs, AILP achieved significant time reduction in computing EMs by orders of magnitude. The proposed AILP algorithm not only offers a computational advantage in the EM analysis of genome-scale networks, but also improves the understanding of the linkage between EMs and MCSs.« less
The Global Genome Biodiversity Network (GGBN) Data Standard specification
Droege, G.; Barker, K.; Seberg, O.; Coddington, J.; Benson, E.; Berendsohn, W. G.; Bunk, B.; Butler, C.; Cawsey, E. M.; Deck, J.; Döring, M.; Flemons, P.; Gemeinholzer, B.; Güntsch, A.; Hollowell, T.; Kelbert, P.; Kostadinov, I.; Kottmann, R.; Lawlor, R. T.; Lyal, C.; Mackenzie-Dodds, J.; Meyer, C.; Mulcahy, D.; Nussbeck, S. Y.; O'Tuama, É.; Orrell, T.; Petersen, G.; Robertson, T.; Söhngen, C.; Whitacre, J.; Wieczorek, J.; Yilmaz, P.; Zetzsche, H.; Zhang, Y.; Zhou, X.
2016-01-01
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard PMID:27694206
MultiMetEval: Comparative and Multi-Objective Analysis of Genome-Scale Metabolic Models
Gevorgyan, Albert; Kierzek, Andrzej M.; Breitling, Rainer; Takano, Eriko
2012-01-01
Comparative metabolic modelling is emerging as a novel field, supported by the development of reliable and standardized approaches for constructing genome-scale metabolic models in high throughput. New software solutions are needed to allow efficient comparative analysis of multiple models in the context of multiple cellular objectives. Here, we present the user-friendly software framework Multi-Metabolic Evaluator (MultiMetEval), built upon SurreyFBA, which allows the user to compose collections of metabolic models that together can be subjected to flux balance analysis. Additionally, MultiMetEval implements functionalities for multi-objective analysis by calculating the Pareto front between two cellular objectives. Using a previously generated dataset of 38 actinobacterial genome-scale metabolic models, we show how these approaches can lead to exciting novel insights. Firstly, after incorporating several pathways for the biosynthesis of natural products into each of these models, comparative flux balance analysis predicted that species like Streptomyces that harbour the highest diversity of secondary metabolite biosynthetic gene clusters in their genomes do not necessarily have the metabolic network topology most suitable for compound overproduction. Secondly, multi-objective analysis of biomass production and natural product biosynthesis in these actinobacteria shows that the well-studied occurrence of discrete metabolic switches during the change of cellular objectives is inherent to their metabolic network architecture. Comparative and multi-objective modelling can lead to insights that could not be obtained by normal flux balance analyses. MultiMetEval provides a powerful platform that makes these analyses straightforward for biologists. Sources and binaries of MultiMetEval are freely available from https://github.com/PiotrZakrzewski/MetEval/downloads. PMID:23272111
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali
2011-01-01
Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additionalmore » genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.« less
Integration of Plant Metabolomics Data with Metabolic Networks: Progresses and Challenges.
Töpfer, Nadine; Seaver, Samuel M D; Aharoni, Asaph
2018-01-01
In the last decade, plant genome-scale modeling has developed rapidly and modeling efforts have advanced from representing metabolic behavior of plant heterotrophic cell suspensions to studying the complex interplay of cell types, tissues, and organs. A crucial driving force for such developments is the availability and integration of "omics" data (e.g., transcriptomics, proteomics, and metabolomics) which enable the reconstruction, extraction, and application of context-specific metabolic networks. In this chapter, we demonstrate a workflow to integrate gas chromatography coupled to mass spectrometry (GC-MS)-based metabolomics data of tomato fruit pericarp (flesh) tissue, at five developmental stages, with a genome-scale reconstruction of tomato metabolism. This method allows for the extraction of context-specific networks reflecting changing activities of metabolic pathways throughout fruit development and maturation.
Orlando, Lori A; Sperber, Nina R; Voils, Corrine; Nichols, Marshall; Myers, Rachel A; Wu, R Ryanne; Rakhra-Burris, Tejinder; Levy, Kenneth D; Levy, Mia; Pollin, Toni I; Guan, Yue; Horowitz, Carol R; Ramos, Michelle; Kimmel, Stephen E; McDonough, Caitrin W; Madden, Ebony B; Damschroder, Laura J
2018-06-01
PurposeImplementation research provides a structure for evaluating the clinical integration of genomic medicine interventions. This paper describes the Implementing Genomics in Practice (IGNITE) Network's efforts to promote (i) a broader understanding of genomic medicine implementation research and (ii) the sharing of knowledge generated in the network.MethodsTo facilitate this goal, the IGNITE Network Common Measures Working Group (CMG) members adopted the Consolidated Framework for Implementation Research (CFIR) to guide its approach to identifying constructs and measures relevant to evaluating genomic medicine as a whole, standardizing data collection across projects, and combining data in a centralized resource for cross-network analyses.ResultsCMG identified 10 high-priority CFIR constructs as important for genomic medicine. Of those, eight did not have standardized measurement instruments. Therefore, we developed four survey tools to address this gap. In addition, we identified seven high-priority constructs related to patients, families, and communities that did not map to CFIR constructs. Both sets of constructs were combined to create a draft genomic medicine implementation model.ConclusionWe developed processes to identify constructs deemed valuable for genomic medicine implementation and codified them in a model. These resources are freely available to facilitate knowledge generation and sharing across the field.
Discovering time-lagged rules from microarray data using gene profile classifiers
2011-01-01
Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. PMID:21524308
Meta genome-wide network from functional linkages of genes in human gut microbial ecosystems.
Ji, Yan; Shi, Yixiang; Wang, Chuan; Dai, Jianliang; Li, Yixue
2013-03-01
The human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.
Yang, Liulin; Li, Yun; Wei, Zhi; Chang, Xiao
2018-06-01
Neuroblastoma is a highly complex and heterogeneous cancer in children. Acquired genomic alterations including MYCN amplification, 1p deletion and 11q deletion are important risk factors and biomarkers in neuroblastoma. Here, we performed a co-expression-based gene network analysis to study the intrinsic association between specific genomic changes and transcriptome organization. We identified multiple gene coexpression modules which are recurrent in two independent datasets and associated with functional pathways including nervous system development, cell cycle, immune system process and extracellular matrix/space. Our results also indicated that modules involved in nervous system development and cell cycle are highly associated with MYCN amplification and 1p deletion, while modules responding to immune system process are associated with MYCN amplification only. In summary, this integrated analysis provides novel insights into molecular heterogeneity and pathogenesis of neuroblastoma. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. Copyright © 2017. Published by Elsevier B.V.
Prangishvili, David
2016-01-01
ABSTRACT Archaea and particularly hyperthermophilic crenarchaea are hosts to many unusual viruses with diverse virion shapes and distinct gene compositions. As is typical of viruses in general, there are no universal genes in the archaeal virosphere. Therefore, to obtain a comprehensive picture of the evolutionary relationships between viruses, network analysis methods are more productive than traditional phylogenetic approaches. Here we present a comprehensive comparative analysis of genomes and proteomes from all currently known taxonomically classified and unclassified, cultivated and uncultivated archaeal viruses. We constructed a bipartite network of archaeal viruses that includes two classes of nodes, the genomes and gene families that connect them. Dissection of this network using formal community detection methods reveals strong modularity, with 10 distinct modules and 3 putative supermodules. However, compared to similar previously analyzed networks of eukaryotic and bacterial viruses, the archaeal virus network is sparsely connected. With the exception of the tailed viruses related to bacteriophages of the order Caudovirales and the families Turriviridae and Sphaerolipoviridae that are linked to a distinct supermodule of eukaryotic and bacterial viruses, there are few connector genes shared by different archaeal virus modules. In contrast, most of these modules include, in addition to viruses, capsidless mobile elements, emphasizing tight evolutionary connections between the two types of entities in archaea. The relative contributions of distinct evolutionary origins, in particular from nonviral elements, and insufficient sampling to the sparsity of the archaeal virus network remain to be determined by further exploration of the archaeal virosphere. IMPORTANCE Viruses infecting archaea are among the most mysterious denizens of the virosphere. Many of these viruses display no genetic or even morphological relationship to viruses of bacteria and eukaryotes, raising questions regarding their origins and position in the global virosphere. Analysis of 5,740 protein sequences from 116 genomes allowed dissection of the archaeal virus network and showed that most groups of archaeal viruses are evolutionarily connected to capsidless mobile genetic elements, including various plasmids and transposons. This finding could reflect actual independent origins of the distinct groups of archaeal viruses from different nonviral elements, providing important insights into the emergence and evolution of the archaeal virome. PMID:27681128
Fang, Hai; Knezevic, Bogdan; Burnham, Katie L; Knight, Julian C
2016-12-13
Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem. We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants. XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .
Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor-McCourt, Maureen
2015-02-01
Tumor genome sequencing leads to documenting thousands of DNA mutations and other genomic alterations. At present, these data cannot be analyzed adequately to aid in the understanding of tumorigenesis and its evolution. Moreover, we have little insight into how to use these data to predict clinical phenotypes and tumor progression to better design patient treatment. To meet these challenges, we discuss a cancer hallmark network framework for modeling genome sequencing data to predict cancer clonal evolution and associated clinical phenotypes. The framework includes: (1) cancer hallmarks that can be represented by a few molecular/signaling networks. 'Network operational signatures' which represent gene regulatory logics/strengths enable to quantify state transitions and measures of hallmark traits. Thus, sets of genomic alterations which are associated with network operational signatures could be linked to the state/measure of hallmark traits. The network operational signature transforms genotypic data (i.e., genomic alterations) to regulatory phenotypic profiles (i.e., regulatory logics/strengths), to cellular phenotypic profiles (i.e., hallmark traits) which lead to clinical phenotypic profiles (i.e., a collection of hallmark traits). Furthermore, the framework considers regulatory logics of the hallmark networks under tumor evolutionary dynamics and therefore also includes: (2) a self-promoting positive feedback loop that is dominated by a genomic instability network and a cell survival/proliferation network is the main driver of tumor clonal evolution. Surrounding tumor stroma and its host immune systems shape the evolutionary paths; (3) cell motility initiating metastasis is a byproduct of the above self-promoting loop activity during tumorigenesis; (4) an emerging hallmark network which triggers genome duplication dominates a feed-forward loop which in turn could act as a rate-limiting step for tumor formation; (5) mutations and other genomic alterations have specific patterns and tissue-specificity, which are driven by aging and other cancer-inducing agents. This framework represents the logics of complex cancer biology as a myriad of phenotypic complexities governed by a limited set of underlying organizing principles. It therefore adds to our understanding of tumor evolution and tumorigenesis, and moreover, potential usefulness of predicting tumors' evolutionary paths and clinical phenotypes. Strategies of using this framework in conjunction with genome sequencing data in an attempt to predict personalized drug targets, drug resistance, and metastasis for cancer patients, as well as cancer risks for healthy individuals are discussed. Accurate prediction of cancer clonal evolution and clinical phenotypes will have substantial impact on timely diagnosis, personalized treatment and personalized prevention of cancer. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Aronson, Samuel; Babb, Lawrence; Ames, Darren; Gibbs, Richard A; Venner, Eric; Connelly, John J; Marsolo, Keith; Weng, Chunhua; Williams, Marc S; Hartzler, Andrea L; Liang, Wayne H; Ralston, James D; Devine, Emily Beth; Murphy, Shawn; Chute, Christopher G; Caraballo, Pedro J; Kullo, Iftikhar J; Freimuth, Robert R; Rasmussen, Luke V; Wehbe, Firas H; Peterson, Josh F; Robinson, Jamie R; Wiley, Ken; Overby Taylor, Casey
2018-05-31
The eMERGE Network is establishing methods for electronic transmittal of patient genetic test results from laboratories to healthcare providers across organizational boundaries. We surveyed the capabilities and needs of different network participants, established a common transfer format, and implemented transfer mechanisms based on this format. The interfaces we created are examples of the connectivity that must be instantiated before electronic genetic and genomic clinical decision support can be effectively built at the point of care. This work serves as a case example for both standards bodies and other organizations working to build the infrastructure required to provide better electronic clinical decision support for clinicians.
Ruths, Troy; Nakhleh, Luay
2013-05-07
Cis-regulatory networks (CRNs) play a central role in cellular decision making. Like every other biological system, CRNs undergo evolution, which shapes their properties by a combination of adaptive and nonadaptive evolutionary forces. Teasing apart these forces is an important step toward functional analyses of the different components of CRNs, designing regulatory perturbation experiments, and constructing synthetic networks. Although tests of neutrality and selection based on molecular sequence data exist, no such tests are currently available based on CRNs. In this work, we present a unique genotype model of CRNs that is grounded in a genomic context and demonstrate its use in identifying portions of the CRN with properties explainable by neutral evolutionary forces at the system, subsystem, and operon levels. We leverage our model against experimentally derived data from Escherichia coli. The results of this analysis show statistically significant and substantial neutral trends in properties previously identified as adaptive in origin--degree distribution, clustering coefficient, and motifs--within the E. coli CRN. Our model captures the tightly coupled genome-interactome of an organism and enables analyses of how evolutionary events acting at the genome level, such as mutation, and at the population level, such as genetic drift, give rise to neutral patterns that we can quantify in CRNs.
Suh, Sooyeon; Kim, Hosung; Dang-Vu, Thien Thanh; Joo, Eunyeon; Shin, Chol
2016-01-01
Study Objectives: Recent studies have suggested that structural abnormalities in insomnia may be linked with alterations in the default-mode network (DMN). This study compared cortical thickness and structural connectivity linked to the DMN in patients with persistent insomnia (PI) and good sleepers (GS). Methods: The current study used a clinical subsample from the longitudinal community-based Korean Genome and Epidemiology Study (KoGES). Cortical thickness and structural connectivity linked to the DMN in patients with persistent insomnia symptoms (PIS; n = 57) were compared to good sleepers (GS; n = 40). All participants underwent MRI acquisition. Based on literature review, we selected cortical regions corresponding to the DMN. A seed-based structural covariance analysis measured cortical thickness correlation between each seed region of the DMN and other cortical areas. Association of cortical thickness and covariance with sleep quality and neuropsychological assessments were further assessed. Results: Compared to GS, cortical thinning was found in PIS in the anterior cingulate cortex, precentral cortex, and lateral prefrontal cortex. Decreased structural connectivity between anterior and posterior regions of the DMN was observed in the PIS group. Decreased structural covariance within the DMN was associated with higher PSQI scores. Cortical thinning in the lateral frontal lobe was related to poor performance in executive function in PIS. Conclusion: Disrupted structural covariance network in PIS might reflect malfunctioning of antero-posterior disconnection of the DMN during the wake to sleep transition that is commonly found during normal sleep. The observed structural network alteration may further implicate commonly observed sustained sleep difficulties and cognitive impairment in insomnia. Citation: Suh S, Kim H, Dang-Vu TT, Joo E, Shin C. Cortical thinning and altered cortico-cortical structural covariance of the default mode network in patients with persistent insomnia symptoms. SLEEP 2016;39(1):161–171. PMID:26414892
Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre F R; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth
2015-07-01
Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. © 2015 American Heart Association, Inc.
Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L
2016-01-04
The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Moskvin, Oleg V; Bolotin, Dmitry; Wang, Andrew; Ivanov, Pavel S; Gomelsky, Mark
2011-02-01
We present Rhodobase, a web-based meta-analytical tool for analysis of transcriptional regulation in a model anoxygenic photosynthetic bacterium, Rhodobacter sphaeroides. The gene association meta-analysis is based on the pooled data from 100 of R. sphaeroides whole-genome DNA microarrays. Gene-centric regulatory networks were visualized using the StarNet approach (Jupiter, D.C., VanBuren, V., 2008. A visual data mining tool that facilitates reconstruction of transcription regulatory networks. PLoS ONE 3, e1717) with several modifications. We developed a means to identify and visualize operons and superoperons. We designed a framework for the cross-genome search for transcription factor binding sites that takes into account high GC-content and oligonucleotide usage profile characteristic of the R. sphaeroides genome. To facilitate reconstruction of directional relationships between co-regulated genes, we screened upstream sequences (-400 to +20bp from start codons) of all genes for putative binding sites of bacterial transcription factors using a self-optimizing search method developed here. To test performance of the meta-analysis tools and transcription factor site predictions, we reconstructed selected nodes of the R. sphaeroides transcription factor-centric regulatory matrix. The test revealed regulatory relationships that correlate well with the experimentally derived data. The database of transcriptional profile correlations, the network visualization engine and the optimized search engine for transcription factor binding sites analysis are available at http://rhodobase.org. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
The founding charter of the Genomic Observatories Network.
Davies, Neil; Field, Dawn; Amaral-Zettler, Linda; Clark, Melody S; Deck, John; Drummond, Alexei; Faith, Daniel P; Geller, Jonathan; Gilbert, Jack; Glöckner, Frank Oliver; Hirsch, Penny R; Leong, Jo-Ann; Meyer, Chris; Obst, Matthias; Planes, Serge; Scholin, Chris; Vogler, Alfried P; Gates, Ruth D; Toonen, Rob; Berteaux-Lecellier, Véronique; Barbier, Michèle; Barker, Katherine; Bertilsson, Stefan; Bicak, Mesude; Bietz, Matthew J; Bobe, Jason; Bodrossy, Levente; Borja, Angel; Coddington, Jonathan; Fuhrman, Jed; Gerdts, Gunnar; Gillespie, Rosemary; Goodwin, Kelly; Hanson, Paul C; Hero, Jean-Marc; Hoekman, David; Jansson, Janet; Jeanthon, Christian; Kao, Rebecca; Klindworth, Anna; Knight, Rob; Kottmann, Renzo; Koo, Michelle S; Kotoulas, Georgios; Lowe, Andrew J; Marteinsson, Viggó Thór; Meyer, Folker; Morrison, Norman; Myrold, David D; Pafilis, Evangelos; Parker, Stephanie; Parnell, John Jacob; Polymenakou, Paraskevi N; Ratnasingham, Sujeevan; Roderick, George K; Rodriguez-Ezpeleta, Naiara; Schonrogge, Karsten; Simon, Nathalie; Valette-Silver, Nathalie J; Springer, Yuri P; Stone, Graham N; Stones-Havas, Steve; Sansone, Susanna-Assunta; Thibault, Kate M; Wecker, Patricia; Wichels, Antje; Wooley, John C; Yahara, Tetsukazu; Zingone, Adriana
2014-03-07
The co-authors of this paper hereby state their intention to work together to launch the Genomic Observatories Network (GOs Network) for which this document will serve as its Founding Charter. We define a Genomic Observatory as an ecosystem and/or site subject to long-term scientific research, including (but not limited to) the sustained study of genomic biodiversity from single-celled microbes to multicellular organisms.An international group of 64 scientists first published the call for a global network of Genomic Observatories in January 2012. The vision for such a network was expanded in a subsequent paper and developed over a series of meetings in Bremen (Germany), Shenzhen (China), Moorea (French Polynesia), Oxford (UK), Pacific Grove (California, USA), Washington (DC, USA), and London (UK). While this community-building process continues, here we express our mutual intent to establish the GOs Network formally, and to describe our shared vision for its future. The views expressed here are ours alone as individual scientists, and do not necessarily represent those of the institutions with which we are affiliated.
The founding charter of the Genomic Observatories Network
2014-01-01
The co-authors of this paper hereby state their intention to work together to launch the Genomic Observatories Network (GOs Network) for which this document will serve as its Founding Charter. We define a Genomic Observatory as an ecosystem and/or site subject to long-term scientific research, including (but not limited to) the sustained study of genomic biodiversity from single-celled microbes to multicellular organisms. An international group of 64 scientists first published the call for a global network of Genomic Observatories in January 2012. The vision for such a network was expanded in a subsequent paper and developed over a series of meetings in Bremen (Germany), Shenzhen (China), Moorea (French Polynesia), Oxford (UK), Pacific Grove (California, USA), Washington (DC, USA), and London (UK). While this community-building process continues, here we express our mutual intent to establish the GOs Network formally, and to describe our shared vision for its future. The views expressed here are ours alone as individual scientists, and do not necessarily represent those of the institutions with which we are affiliated. PMID:24606731
Prediction of individualized therapeutic vulnerabilities in cancer from genomic profiles
Aksoy, Bülent Arman; Demir, Emek; Babur, Özgün; Wang, Weiqing; Jing, Xiaohong; Schultz, Nikolaus; Sander, Chris
2014-01-01
Motivation: Somatic homozygous deletions of chromosomal regions in cancer, while not necessarily oncogenic, may lead to therapeutic vulnerabilities specific to cancer cells compared with normal cells. A recently reported example is the loss of one of the two isoenzymes in glioblastoma cancer cells such that the use of a specific inhibitor selectively inhibited growth of the cancer cells, which had become fully dependent on the second isoenzyme. We have now made use of the unprecedented conjunction of large-scale cancer genomics profiling of tumor samples in The Cancer Genome Atlas (TCGA) and of tumor-derived cell lines in the Cancer Cell Line Encyclopedia, as well as the availability of integrated pathway information systems, such as Pathway Commons, to systematically search for a comprehensive set of such epistatic vulnerabilities. Results: Based on homozygous deletions affecting metabolic enzymes in 16 TCGA cancer studies and 972 cancer cell lines, we identified 4104 candidate metabolic vulnerabilities present in 1019 tumor samples and 482 cell lines. Up to 44% of these vulnerabilities can be targeted with at least one Food and Drug Administration-approved drug. We suggest focused experiments to test these vulnerabilities and clinical trials based on personalized genomic profiles of those that pass preclinical filters. We conclude that genomic profiling will in the future provide a promising basis for network pharmacology of epistatic vulnerabilities as a promising therapeutic strategy. Availability and implementation: A web-based tool for exploring all vulnerabilities and their details is available at http://cbio.mskcc.org/cancergenomics/statius/ along with supplemental data files. Contact: statius@cbio.mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24665131
Independent evolution of genomic characters during major metazoan transitions.
Simakov, Oleg; Kawashima, Takeshi
2017-07-15
Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Zanotto, Paolo Marinho de Andrade; Krakauer, David C.
2008-01-01
We consider the concerted evolution of viral genomes in four families of DNA viruses. Given the high rate of horizontal gene transfer among viruses and their hosts, it is an open question as to how representative particular genes are of the evolutionary history of the complete genome. To address the concerted evolution of viral genes, we compared genomic evolution across four distinct, extant viral families. For all four viral families we constructed DNA-dependent DNA polymerase-based (DdDp) phylogenies and in addition, whole genome sequence, as quantitative descriptions of inter-genome relationships. We found that the history of the polymerase gene was highly predictive of the history of the genome as a whole, which we explain in terms of repeated, co-divergence events of the core DdDp gene accompanied by a number of satellite, accessory genetic loci. We also found that the rate of gene gain in baculovirus and poxviruses proceeds significantly more quickly than the rate of gene loss and that there is convergent acquisition of satellite functions promoting contextual adaptation when distinct viral families infect related hosts. The congruence of the genome and polymerase trees suggests that a large set of viral genes, including polymerase, derive from a phylogenetically conserved core of genes of host origin, secondarily reinforced by gene acquisition from common hosts or co-infecting viruses within the host. A single viral genome can be thought of as a mutualistic network, with the core genes acting as an effective host and the satellite genes as effective symbionts. Larger virus genomes show a greater departure from linkage equilibrium between core and satellites functions. PMID:18941535
Puzzles in modern biology. V. Why are genomes overwired?
Frank, Steven A
2017-01-01
Many factors affect eukaryotic gene expression. Transcription factors, histone codes, DNA folding, and noncoding RNA modulate expression. Those factors interact in large, broadly connected regulatory control networks. An engineer following classical principles of control theory would design a simpler regulatory network. Why are genomes overwired? Neutrality or enhanced robustness may lead to the accumulation of additional factors that complicate network architecture. Dynamics progresses like a ratchet. New factors get added. Genomes adapt to the additional complexity. The newly added factors can no longer be removed without significant loss of fitness. Alternatively, highly wired genomes may be more malleable. In large networks, most genomic variants tend to have a relatively small effect on gene expression and trait values. Many small effects lead to a smooth gradient, in which traits may change steadily with respect to underlying regulatory changes. A smooth gradient may provide a continuous path from a starting point up to the highest peak of performance. A potential path of increasing performance promotes adaptability and learning. Genomes gain by the inductive process of natural selection, a trial and error learning algorithm that discovers general solutions for adapting to environmental challenge. Similarly, deeply and densely connected computational networks gain by various inductive trial and error learning procedures, in which the networks learn to reduce the errors in sequential trials. Overwiring alters the geometry of induction by smoothing the gradient along the inductive pathways of improving performance. Those overwiring benefits for induction apply to both natural biological networks and artificial deep learning networks.
Wolff, Sara M; Ellison, Melinda J; Hao, Yue; Cockrum, Rebecca R; Austin, Kathy J; Baraboo, Michael; Burch, Katherine; Lee, Hyuk Jin; Maurer, Taylor; Patil, Rocky; Ravelo, Andrea; Taxis, Tasia M; Truong, Huan; Lamberson, William R; Cammack, Kristi M; Conant, Gavin C
2017-06-08
Grazing mammals rely on their ruminal microbial symbionts to convert plant structural biomass into metabolites they can assimilate. To explore how this complex metabolic system adapts to the host animal's diet, we inferred a microbiome-level metabolic network from shotgun metagenomic data. Using comparative genomics, we then linked this microbial network to that of the host animal using a set of interface metabolites likely to be transferred to the host. When the host sheep were fed a grain-based diet, the induced microbial metabolic network showed several critical differences from those seen on the evolved forage-based diet. Grain-based (e.g., concentrate) diets tend to be dominated by a smaller set of reactions that employ metabolites that are nearer in network space to the host's metabolism. In addition, these reactions are more central in the network and employ substrates with shorter carbon backbones. Despite this apparent lower complexity, the concentrate-associated metabolic networks are actually more dissimilar from each other than are those of forage-fed animals. Because both groups of animals were initially fed on a forage diet, we propose that the diet switch drove the appearance of a number of different microbial networks, including a degenerate network characterized by an inefficient use of dietary nutrients. We used network simulations to show that such disparate networks are not an unexpected result of a diet shift. We argue that network approaches, particularly those that link the microbial network with that of the host, illuminate aspects of the structure of the microbiome not seen from a strictly taxonomic perspective. In particular, different diets induce predictable and significant differences in the enzymes used by the microbiome. Nonetheless, there are clearly a number of microbiomes of differing structure that show similar functional properties. Changes such as a diet shift uncover more of this type of diversity.
Ensembl comparative genomics resources.
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.
Ensembl comparative genomics resources
Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847
CycADS: an annotation database system to ease the development and update of BioCyc databases
Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano
2011-01-01
In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org PMID:21474551
Wang, Jingxue; Singh, Sanjay K; Du, Chunfang; Li, Chen; Fan, Jianchun; Pattanaik, Sitakanta; Yuan, Ling
2016-01-01
Rapeseed ( Brassica napus ) is an important oil seed crop, providing more than 13% of the world's supply of edible oils. An in-depth knowledge of the gene network involved in biosynthesis and accumulation of seed oil is critical for the improvement of B. napus . Using available genomic and transcriptomic resources, we identified 1,750 acyl-lipid metabolism (ALM) genes that are distributed over 19 chromosomes in the B . napus genome. B. rapa and B. oleracea , two diploid progenitors of B. napus , contributed almost equally to the ALM genes. Genome collinearity analysis demonstrated that the majority of the ALM genes have arisen due to genome duplication or segmental duplication events. In addition, we profiled the expression patterns of the ALM genes in four different developmental stages. Furthermore, we developed two B. napus near isogenic lines (NILs). The high oil NIL, YC13-559, accumulates significantly higher (∼10%) seed oil compared to the other, YC13-554. Comparative gene expression analysis revealed upregulation of lipid biosynthesis-related regulatory genes in YC13-559, including SHOOTMERISTEMLESS, LEAFY COTYLEDON 1 (LEC1), LEC2, FUSCA3, ABSCISIC ACID INSENSITIVE 3 (ABI3), ABI4, ABI5 , and WRINKLED1 , as well as structural genes, such as ACETYL-CoA CARBOXYLASE, ACYL-CoA DIACYLGLYCEROL ACYLTRANSFERASE , and LONG - CHAIN ACYL-CoA SYNTHETASES . We observed that several genes related to the phytohormones, gibberellins, jasmonate, and indole acetic acid, were differentially expressed in the NILs. Our findings provide a broad account of the numbers, distribution, and expression profiles of acyl-lipid metabolism genes, as well as gene networks that potentially control oil accumulation in B . napus seeds. The upregulation of key regulatory and structural genes related to lipid biosynthesis likely plays a major role for the increased seed oil in YC13-559.
From integrative genomics to systems genetics in the rat to link genotypes to phenotypes
Moreno-Moral, Aida
2016-01-01
ABSTRACT Complementary to traditional gene mapping approaches used to identify the hereditary components of complex diseases, integrative genomics and systems genetics have emerged as powerful strategies to decipher the key genetic drivers of molecular pathways that underlie disease. Broadly speaking, integrative genomics aims to link cellular-level traits (such as mRNA expression) to the genome to identify their genetic determinants. With the characterization of several cellular-level traits within the same system, the integrative genomics approach evolved into a more comprehensive study design, called systems genetics, which aims to unravel the complex biological networks and pathways involved in disease, and in turn map their genetic control points. The first fully integrated systems genetics study was carried out in rats, and the results, which revealed conserved trans-acting genetic regulation of a pro-inflammatory network relevant to type 1 diabetes, were translated to humans. Many studies using different organisms subsequently stemmed from this example. The aim of this Review is to describe the most recent advances in the fields of integrative genomics and systems genetics applied in the rat, with a focus on studies of complex diseases ranging from inflammatory to cardiometabolic disorders. We aim to provide the genetics community with a comprehensive insight into how the systems genetics approach came to life, starting from the first integrative genomics strategies [such as expression quantitative trait loci (eQTLs) mapping] and concluding with the most sophisticated gene network-based analyses in multiple systems and disease states. Although not limited to studies that have been directly translated to humans, we will focus particularly on the successful investigations in the rat that have led to primary discoveries of genes and pathways relevant to human disease. PMID:27736746
From integrative genomics to systems genetics in the rat to link genotypes to phenotypes.
Moreno-Moral, Aida; Petretto, Enrico
2016-10-01
Complementary to traditional gene mapping approaches used to identify the hereditary components of complex diseases, integrative genomics and systems genetics have emerged as powerful strategies to decipher the key genetic drivers of molecular pathways that underlie disease. Broadly speaking, integrative genomics aims to link cellular-level traits (such as mRNA expression) to the genome to identify their genetic determinants. With the characterization of several cellular-level traits within the same system, the integrative genomics approach evolved into a more comprehensive study design, called systems genetics, which aims to unravel the complex biological networks and pathways involved in disease, and in turn map their genetic control points. The first fully integrated systems genetics study was carried out in rats, and the results, which revealed conserved trans-acting genetic regulation of a pro-inflammatory network relevant to type 1 diabetes, were translated to humans. Many studies using different organisms subsequently stemmed from this example. The aim of this Review is to describe the most recent advances in the fields of integrative genomics and systems genetics applied in the rat, with a focus on studies of complex diseases ranging from inflammatory to cardiometabolic disorders. We aim to provide the genetics community with a comprehensive insight into how the systems genetics approach came to life, starting from the first integrative genomics strategies [such as expression quantitative trait loci (eQTLs) mapping] and concluding with the most sophisticated gene network-based analyses in multiple systems and disease states. Although not limited to studies that have been directly translated to humans, we will focus particularly on the successful investigations in the rat that have led to primary discoveries of genes and pathways relevant to human disease. © 2016. Published by The Company of Biologists Ltd.
Modos, Dezso; Brooks, Johanne; Fazekas, David; Ari, Eszter; Vellai, Tibor; Csermely, Peter; Korcsmaros, Tamas; Lenti, Katalin
2016-01-01
Extensive cross-talk between signaling pathways is required to integrate the myriad of extracellular signal combinations at the cellular level. Gene duplication events may lead to the emergence of novel functions, leaving groups of similar genes - termed paralogs - in the genome. To distinguish critical paralog groups (CPGs) from other paralogs in human signaling networks, we developed a signaling network-based method using cross-talk annotation and tissue-specific signaling flow analysis. 75 CPGs were found with higher degree, betweenness centrality, closeness, and ‘bowtieness’ when compared to other paralogs or other proteins in the signaling network. CPGs had higher diversity in all these measures, with more varied biological functions and more specific post-transcriptional regulation than non-critical paralog groups (non-CPG). Using TGF-beta, Notch and MAPK pathways as examples, SMAD2/3, NOTCH1/2/3 and MEK3/6-p38 CPGs were found to regulate the signaling flow of their respective pathways. Additionally, CPGs showed a higher mutation rate in both inherited diseases and cancer, and were enriched in drug targets. In conclusion, the results revealed two distinct types of paralog groups in the signaling network: CPGs and non-CPGs. Thus highlighting the importance of CPGs as compared to non-CPGs in drug discovery and disease pathogenesis. PMID:27922122
Malashchuk, Igor; Lajoie, Brian R.; Mardaryev, Andrei N.; Gdula, Michal R.; Sharov, Andrey A.; Kohwi-Shigematsu, Terumi; Fessing, Michael Y.
2017-01-01
Mammalian genomes contain several dozens of large (>0.5 Mbp) lineage-specific gene loci harbouring functionally related genes. However, spatial chromatin folding, organization of the enhancer-promoter networks and their relevance to Topologically Associating Domains (TADs) in these loci remain poorly understood. TADs are principle units of the genome folding and represents the DNA regions within which DNA interacts more frequently and less frequently across the TAD boundary. Here, we used Chromatin Conformation Capture Carbon Copy (5C) technology to characterize spatial chromatin interaction network in the 3.1 Mb Epidermal Differentiation Complex (EDC) locus harbouring 61 functionally related genes that show lineage-specific activation during terminal keratinocyte differentiation in the epidermis. 5C data validated by 3D-FISH demonstrate that the EDC locus is organized into several TADs showing distinct lineage-specific chromatin interaction networks based on their transcription activity and the gene-rich or gene-poor status. Correlation of the 5C results with genome-wide studies for enhancer-specific histone modifications (H3K4me1 and H3K27ac) revealed that the majority of spatial chromatin interactions that involves the gene-rich TADs at the EDC locus in keratinocytes include both intra- and inter-TAD interaction networks, connecting gene promoters and enhancers. Compared to thymocytes in which the EDC locus is mostly transcriptionally inactive, these interactions were found to be keratinocyte-specific. In keratinocytes, the promoter-enhancer anchoring regions in the gene-rich transcriptionally active TADs are enriched for the binding of chromatin architectural proteins CTCF, Rad21 and chromatin remodeler Brg1. In contrast to gene-rich TADs, gene-poor TADs show preferential spatial contacts with each other, do not contain active enhancers and show decreased binding of CTCF, Rad21 and Brg1 in keratinocytes. Thus, spatial interactions between gene promoters and enhancers at the multi-TAD EDC locus in skin epithelial cells are cell type-specific and involve extensive contacts within TADs as well as between different gene-rich TADs, forming the framework for lineage-specific transcription. PMID:28863138
BμG@Sbase—a microbial gene expression and comparative genomic database
Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason
2012-01-01
The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792
BμG@Sbase--a microbial gene expression and comparative genomic database.
Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason
2012-01-01
The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.
Levering, Jennifer; Fiedler, Tomas; Sieg, Antje; van Grinsven, Koen W A; Hering, Silvio; Veith, Nadine; Olivier, Brett G; Klett, Lara; Hugenholtz, Jeroen; Teusink, Bas; Kreikemeyer, Bernd; Kummer, Ursula
2016-08-20
Genome-scale metabolic models comprise stoichiometric relations between metabolites, as well as associations between genes and metabolic reactions and facilitate the analysis of metabolism. We computationally reconstructed the metabolic network of the lactic acid bacterium Streptococcus pyogenes M49. Initially, we based the reconstruction on genome annotations and already existing and curated metabolic networks of Bacillus subtilis, Escherichia coli, Lactobacillus plantarum and Lactococcus lactis. This initial draft was manually curated with the final reconstruction accounting for 480 genes associated with 576 reactions and 558 metabolites. In order to constrain the model further, we performed growth experiments of wild type and arcA deletion strains of S. pyogenes M49 in a chemically defined medium and calculated nutrient uptake and production fluxes. We additionally performed amino acid auxotrophy experiments to test the consistency of the model. The established genome-scale model can be used to understand the growth requirements of the human pathogen S. pyogenes and define optimal and suboptimal conditions, but also to describe differences and similarities between S. pyogenes and related lactic acid bacteria such as L. lactis in order to find strategies to reduce the growth of the pathogen and propose drug targets. Copyright © 2016 Elsevier B.V. All rights reserved.
A Review of the Accomplishments of the CTD² Network | Office of Cancer Genomics
The Office of Cancer Genomics (OCG) Cancer Target Discovery and Development or CTD2 initiative was established by the National Cancer Institute (NCI) to accelerate the “translation” of high-throughput, high-content genomic data to the bedside through functional genomics. The CTD2 initiative is a collaborative network of 13 different research teams, or Centers.
Genome-Wide Networks of Amino Acid Covariances Are Common among Viruses
Donlin, Maureen J.; Szeto, Brandon; Gohara, David W.; Aurora, Rajeev
2012-01-01
Coordinated variation among positions in amino acid sequence alignments can reveal genetic dependencies at noncontiguous positions, but methods to assess these interactions are incompletely developed. Previously, we found genome-wide networks of covarying residue positions in the hepatitis C virus genome (R. Aurora, M. J. Donlin, N. A. Cannon, and J. E. Tavis, J. Clin. Invest. 119:225–236, 2009). Here, we asked whether such networks are present in a diverse set of viruses and, if so, what they may imply about viral biology. Viral sequences were obtained for 16 viruses in 13 species from 9 families. The entire viral coding potential for each virus was aligned, all possible amino acid covariances were identified using the observed-minus-expected-squared algorithm at a false-discovery rate of ≤1%, and networks of covariances were assessed using standard methods. Covariances that spanned the viral coding potential were common in all viruses. In all cases, the covariances formed a single network that contained essentially all of the covariances. The hepatitis C virus networks had hub-and-spoke topologies, but all other networks had random topologies with an unusually large number of highly connected nodes. These results indicate that genome-wide networks of genetic associations and the coordinated evolution they imply are very common in viral genomes, that the networks rarely have the hub-and-spoke topology that dominates other biological networks, and that network topologies can vary substantially even within a given viral group. Five examples with hepatitis B virus and poliovirus are presented to illustrate how covariance network analysis can lead to inferences about viral biology. PMID:22238298
Prediction of lipoprotein signal peptides in Gram-negative bacteria.
Juncker, Agnieszka S; Willenbrock, Hanni; Von Heijne, Gunnar; Brunak, Søren; Nielsen, Henrik; Krogh, Anders
2003-08-01
A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/.
Prediction of lipoprotein signal peptides in Gram-negative bacteria
Juncker, Agnieszka S.; Willenbrock, Hanni; von Heijne, Gunnar; Brunak, Søren; Nielsen, Henrik; Krogh, Anders
2003-01-01
A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/. PMID:12876315
Pérez-Palma, Eduardo; Bustos, Bernabé I; Villamán, Camilo F; Alarcón, Marcelo A; Avila, Miguel E; Ugarte, Giorgia D; Reyes, Ariel E; Opazo, Carlos; De Ferrari, Giancarlo V
2014-01-01
Genome-wide association studies (GWAS) have successfully identified several risk loci for Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (>248,000 bp) associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that ontological sub-networks (SNs) involved in glutamate signaling were significantly overrepresented in AD (p<2.7×10(-11), p<1.9×10(-11); GW and GATES, respectively). Notably, glutamate signaling SNs were also found to be significantly overrepresented (p<5.1×10(-8)) in the Alzheimer's disease Neuroimaging Initiative (ADNI) study, which was used as a targeted replication sample. Interestingly, components of the glutamate signaling SNs are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder.
Villamán, Camilo F.; Alarcón, Marcelo A.; Avila, Miguel E.; Ugarte, Giorgia D.; Reyes, Ariel E.; Opazo, Carlos; De Ferrari, Giancarlo V.
2014-01-01
Genome-wide association studies (GWAS) have successfully identified several risk loci for Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (>248,000 bp) associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that ontological sub-networks (SNs) involved in glutamate signaling were significantly overrepresented in AD (p<2.7×10−11, p<1.9×10−11; GW and GATES, respectively). Notably, glutamate signaling SNs were also found to be significantly overrepresented (p<5.1×10−8) in the Alzheimer's disease Neuroimaging Initiative (ADNI) study, which was used as a targeted replication sample. Interestingly, components of the glutamate signaling SNs are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder. PMID:24755620
Huang, You-Jun; Liu, Li-Li; Huang, Jian-Qin; Wang, Zheng-Jia; Chen, Fang-Fang; Zhang, Qi-Xiang; Zheng, Bing-Song; Chen, Ming
2013-10-10
Different from herbaceous plants, the woody plants undergo a long-period vegetative stage to achieve floral transition. They then turn into seasonal plants, flowering annually. In this study, a preliminary model of gene regulations for seasonal pistillate flowering in hickory (Carya cathayensis) was proposed. The genome-wide dynamic transcriptome was characterized via the joint-approach of RNA sequencing and microarray analysis. Differential transcript abundance analysis uncovered the dynamic transcript abundance patterns of flowering correlated genes and their major functions based on Gene Ontology (GO) analysis. To explore pistillate flowering mechanism in hickory, a comprehensive flowering gene regulatory network based on Arabidopsis thaliana was constructed by additional literature mining. A total of 114 putative flowering or floral genes including 31 with differential transcript abundance were identified in hickory. The locations, functions and dynamic transcript abundances were analyzed in the gene regulatory networks. A genome-wide co-expression network for the putative flowering or floral genes shows three flowering regulatory modules corresponding to response to light abiotic stimulus, cold stress, and reproductive development process, respectively. Totally 27 potential flowering or floral genes were recruited which are meaningful to understand the hickory specific seasonal flowering mechanism better. Flowering event of pistillate flower bud in hickory is triggered by several pathways synchronously including the photoperiod, autonomous, vernalization, gibberellin, and sucrose pathway. Totally 27 potential flowering or floral genes were recruited from the genome-wide co-expression network function module analysis. Moreover, the analysis provides a potential FLC-like gene based vernalization pathway and an 'AC' model for pistillate flower development in hickory. This work provides an available framework for pistillate flower development in hickory, which is significant for insight into regulation of flowering and floral development of woody plants.
2013-01-01
Background Different from herbaceous plants, the woody plants undergo a long-period vegetative stage to achieve floral transition. They then turn into seasonal plants, flowering annually. In this study, a preliminary model of gene regulations for seasonal pistillate flowering in hickory (Carya cathayensis) was proposed. The genome-wide dynamic transcriptome was characterized via the joint-approach of RNA sequencing and microarray analysis. Results Differential transcript abundance analysis uncovered the dynamic transcript abundance patterns of flowering correlated genes and their major functions based on Gene Ontology (GO) analysis. To explore pistillate flowering mechanism in hickory, a comprehensive flowering gene regulatory network based on Arabidopsis thaliana was constructed by additional literature mining. A total of 114 putative flowering or floral genes including 31 with differential transcript abundance were identified in hickory. The locations, functions and dynamic transcript abundances were analyzed in the gene regulatory networks. A genome-wide co-expression network for the putative flowering or floral genes shows three flowering regulatory modules corresponding to response to light abiotic stimulus, cold stress, and reproductive development process, respectively. Totally 27 potential flowering or floral genes were recruited which are meaningful to understand the hickory specific seasonal flowering mechanism better. Conclusions Flowering event of pistillate flower bud in hickory is triggered by several pathways synchronously including the photoperiod, autonomous, vernalization, gibberellin, and sucrose pathway. Totally 27 potential flowering or floral genes were recruited from the genome-wide co-expression network function module analysis. Moreover, the analysis provides a potential FLC-like gene based vernalization pathway and an 'AC’ model for pistillate flower development in hickory. This work provides an available framework for pistillate flower development in hickory, which is significant for insight into regulation of flowering and floral development of woody plants. PMID:24106755
Sharma, Rita; Cao, Peijian; Jung, Ki-Hong; Sharma, Manoj K.; Ronald, Pamela C.
2013-01-01
Glycoside hydrolases (GH) catalyze the hydrolysis of glycosidic bonds in cell wall polymers and can have major effects on cell wall architecture. Taking advantage of the massive datasets available in public databases, we have constructed a rice phylogenomic database of GHs (http://ricephylogenomics.ucdavis.edu/cellwalls/gh/). This database integrates multiple data types including the structural features, orthologous relationships, mutant availability, and gene expression patterns for each GH family in a phylogenomic context. The rice genome encodes 437 GH genes classified into 34 families. Based on pairwise comparison with eight dicot and four monocot genomes, we identified 138 GH genes that are highly diverged between monocots and dicots, 57 of which have diverged further in rice as compared with four monocot genomes scanned in this study. Chromosomal localization and expression analysis suggest a role for both whole-genome and localized gene duplications in expansion and diversification of GH families in rice. We examined the meta-profiles of expression patterns of GH genes in twenty different anatomical tissues of rice. Transcripts of 51 genes exhibit tissue or developmental stage-preferential expression, whereas, seventeen other genes preferentially accumulate in actively growing tissues. When queried in RiceNet, a probabilistic functional gene network that facilitates functional gene predictions, nine out of seventeen genes form a regulatory network with the well-characterized genes involved in biosynthesis of cell wall polymers including cellulose synthase and cellulose synthase-like genes of rice. Two-thirds of the GH genes in rice are up regulated in response to biotic and abiotic stress treatments indicating a role in stress adaptation. Our analyses identify potential GH targets for cell wall modification. PMID:23986771
Ficklin, Stephen P.; Feltus, F. Alex
2011-01-01
One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species. PMID:21606319
Ficklin, Stephen P; Feltus, F Alex
2011-07-01
One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
A comparative study of disease genes and drug targets in the human protein interactome
2015-01-01
Background Disease genes cause or contribute genetically to the development of the most complex diseases. Drugs are the major approaches to treat the complex disease through interacting with their targets. Thus, drug targets are critical for treatment efficacy. However, the interrelationship between the disease genes and drug targets is not clear. Results In this study, we comprehensively compared the network properties of disease genes and drug targets for five major disease categories (cancer, cardiovascular disease, immune system disease, metabolic disease, and nervous system disease). We first collected disease genes from genome-wide association studies (GWAS) for five disease categories and collected their corresponding drugs based on drugs' Anatomical Therapeutic Chemical (ATC) classification. Then, we obtained the drug targets for these five different disease categories. We found that, though the intersections between disease genes and drug targets were small, disease genes were significantly enriched in targets compared to their enrichment in human protein-coding genes. We further compared network properties of the proteins encoded by disease genes and drug targets in human protein-protein interaction networks (interactome). The results showed that the drug targets tended to have higher degree, higher betweenness, and lower clustering coefficient in cancer Furthermore, we observed a clear fraction increase of disease proteins or drug targets in the near neighborhood compared with the randomized genes. Conclusions The study presents the first comprehensive comparison of the disease genes and drug targets in the context of interactome. The results provide some foundational network characteristics for further designing computational strategies to predict novel drug targets and drug repurposing. PMID:25861037
A comparative study of disease genes and drug targets in the human protein interactome.
Sun, Jingchun; Zhu, Kevin; Zheng, W; Xu, Hua
2015-01-01
Disease genes cause or contribute genetically to the development of the most complex diseases. Drugs are the major approaches to treat the complex disease through interacting with their targets. Thus, drug targets are critical for treatment efficacy. However, the interrelationship between the disease genes and drug targets is not clear. In this study, we comprehensively compared the network properties of disease genes and drug targets for five major disease categories (cancer, cardiovascular disease, immune system disease, metabolic disease, and nervous system disease). We first collected disease genes from genome-wide association studies (GWAS) for five disease categories and collected their corresponding drugs based on drugs' Anatomical Therapeutic Chemical (ATC) classification. Then, we obtained the drug targets for these five different disease categories. We found that, though the intersections between disease genes and drug targets were small, disease genes were significantly enriched in targets compared to their enrichment in human protein-coding genes. We further compared network properties of the proteins encoded by disease genes and drug targets in human protein-protein interaction networks (interactome). The results showed that the drug targets tended to have higher degree, higher betweenness, and lower clustering coefficient in cancer Furthermore, we observed a clear fraction increase of disease proteins or drug targets in the near neighborhood compared with the randomized genes. The study presents the first comprehensive comparison of the disease genes and drug targets in the context of interactome. The results provide some foundational network characteristics for further designing computational strategies to predict novel drug targets and drug repurposing.
Scholz, Birger; Doidge, Amie N.; Barnes, Philip; Hall, Jeremy; Wilkinson, Lawrence S.; Thomas, Kerrie L.
2016-01-01
We investigated the distinctiveness of gene regulatory networks in CA1 associated with the extinction of contextual fear memory (CFM) after recall using Affymetrix GeneChip Rat Genome 230 2.0 Arrays. These data were compared to previously published retrieval and reconsolidation-attributed, and consolidation datasets. A stringent dual normalization and pareto-scaled orthogonal partial least-square discriminant multivariate analysis together with a jack-knifing-based cross-validation approach was used on all datasets to reduce false positives. Consolidation, retrieval and extinction were correlated with distinct patterns of gene expression 2 hours later. Extinction-related gene expression was most distinct from the profile accompanying consolidation. A highly specific feature was the discrete regulation of neuroimmunological gene expression associated with retrieval and extinction. Immunity–associated genes of the tyrosine kinase receptor TGFβ and PDGF, and TNF families’ characterized extinction. Cytokines and proinflammatory interleukins of the IL-1 and IL-6 families were enriched with the no-extinction retrieval condition. We used comparative genomics to predict transcription factor binding sites in proximal promoter regions of the retrieval-regulated genes. Retrieval that does not lead to extinction was associated with NF-κB-mediated gene expression. We confirmed differential NF-κBp65 expression, and activity in all of a representative sample of our candidate genes in the no-extinction condition. The differential regulation of cytokine networks after the acquisition and retrieval of CFM identifies the important contribution that neuroimmune signalling plays in normal hippocampal function. Further, targeting cytokine signalling upon retrieval offers a therapeutic strategy to promote extinction mechanisms in human disorders characterised by dysregulation of associative memory. PMID:27224427
Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T
2014-12-01
Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).
Leao, Tiago; Castelão, Guilherme; Monroe, Emily A.; Podell, Sheila; Glukhov, Evgenia; Allen, Eric E.; Gerwick, William H.; Gerwick, Lena
2017-01-01
Cyanobacteria are major sources of oxygen, nitrogen, and carbon in nature. In addition to the importance of their primary metabolism, some cyanobacteria are prolific producers of unique and bioactive secondary metabolites. Chemical investigations of the cyanobacterial genus Moorea have resulted in the isolation of over 190 compounds in the last two decades. However, preliminary genomic analysis has suggested that genome-guided approaches can enable the discovery of novel compounds from even well-studied Moorea strains, highlighting the importance of obtaining complete genomes. We report a complete genome of a filamentous tropical marine cyanobacterium, Moorea producens PAL, which reveals that about one-fifth of its genome is devoted to production of secondary metabolites, an impressive four times the cyanobacterial average. Moreover, possession of the complete PAL genome has allowed improvement to the assembly of three other Moorea draft genomes. Comparative genomics revealed that they are remarkably similar to one another, despite their differences in geography, morphology, and secondary metabolite profiles. Gene cluster networking highlights that this genus is distinctive among cyanobacteria, not only in the number of secondary metabolite pathways but also in the content of many pathways, which are potentially distinct from all other bacterial gene clusters to date. These findings portend that future genome-guided secondary metabolite discovery and isolation efforts should be highly productive. PMID:28265051
Rapid molecular evolution across amniotes of the IIS/TOR network
McGaugh, Suzanne E.; Bronikowski, Anne M.; Kuo, Chih-Horng; Reding, Dawn M.; Addis, Elizabeth A.; Flagel, Lex E.; Janzen, Fredric J.
2015-01-01
The insulin/insulin-like signaling and target of rapamycin (IIS/TOR) network regulates lifespan and reproduction, as well as metabolic diseases, cancer, and aging. Despite its vital role in health, comparative analyses of IIS/TOR have been limited to invertebrates and mammals. We conducted an extensive evolutionary analysis of the IIS/TOR network across 66 amniotes with 18 newly generated transcriptomes from nonavian reptiles and additional available genomes/transcriptomes. We uncovered rapid and extensive molecular evolution between reptiles (including birds) and mammals: (i) the IIS/TOR network, including the critical nodes insulin receptor substrate (IRS) and phosphatidylinositol 3-kinase (PI3K), exhibit divergent evolutionary rates between reptiles and mammals; (ii) compared with a proxy for the rest of the genome, genes of the IIS/TOR extracellular network exhibit exceptionally fast evolutionary rates; and (iii) signatures of positive selection and coevolution of the extracellular network suggest reptile- and mammal-specific interactions between members of the network. In reptiles, positively selected sites cluster on the binding surfaces of insulin-like growth factor 1 (IGF1), IGF1 receptor (IGF1R), and insulin receptor (INSR); whereas in mammals, positively selected sites clustered on the IGF2 binding surface, suggesting that these hormone-receptor binding affinities are targets of positive selection. Further, contrary to reports that IGF2R binds IGF2 only in marsupial and placental mammals, we found positively selected sites clustered on the hormone binding surface of reptile IGF2R that suggest that IGF2R binds to IGF hormones in diverse taxa and may have evolved in reptiles. These data suggest that key IIS/TOR paralogs have sub- or neofunctionalized between mammals and reptiles and that this network may underlie fundamental life history and physiological differences between these amniote sister clades. PMID:25991861
Rapid molecular evolution across amniotes of the IIS/TOR network.
McGaugh, Suzanne E; Bronikowski, Anne M; Kuo, Chih-Horng; Reding, Dawn M; Addis, Elizabeth A; Flagel, Lex E; Janzen, Fredric J; Schwartz, Tonia S
2015-06-02
The insulin/insulin-like signaling and target of rapamycin (IIS/TOR) network regulates lifespan and reproduction, as well as metabolic diseases, cancer, and aging. Despite its vital role in health, comparative analyses of IIS/TOR have been limited to invertebrates and mammals. We conducted an extensive evolutionary analysis of the IIS/TOR network across 66 amniotes with 18 newly generated transcriptomes from nonavian reptiles and additional available genomes/transcriptomes. We uncovered rapid and extensive molecular evolution between reptiles (including birds) and mammals: (i) the IIS/TOR network, including the critical nodes insulin receptor substrate (IRS) and phosphatidylinositol 3-kinase (PI3K), exhibit divergent evolutionary rates between reptiles and mammals; (ii) compared with a proxy for the rest of the genome, genes of the IIS/TOR extracellular network exhibit exceptionally fast evolutionary rates; and (iii) signatures of positive selection and coevolution of the extracellular network suggest reptile- and mammal-specific interactions between members of the network. In reptiles, positively selected sites cluster on the binding surfaces of insulin-like growth factor 1 (IGF1), IGF1 receptor (IGF1R), and insulin receptor (INSR); whereas in mammals, positively selected sites clustered on the IGF2 binding surface, suggesting that these hormone-receptor binding affinities are targets of positive selection. Further, contrary to reports that IGF2R binds IGF2 only in marsupial and placental mammals, we found positively selected sites clustered on the hormone binding surface of reptile IGF2R that suggest that IGF2R binds to IGF hormones in diverse taxa and may have evolved in reptiles. These data suggest that key IIS/TOR paralogs have sub- or neofunctionalized between mammals and reptiles and that this network may underlie fundamental life history and physiological differences between these amniote sister clades.
Wang, Liwei; Liu, Hongfang; Chute, Christopher G; Zhu, Qian
2015-01-01
Pharmacogenomics (PGx) as an emerging field, is poised to change the way we practice medicine and deliver health care by customizing drug therapies on the basis of each patient's genetic makeup. A large volume of PGx data including information among drugs, genes, and single nucleotide polymorphisms (SNPs) has been accumulated. Normalized and integrated PGx information could facilitate revelation of hidden relationships among drug treatments, genomic variations, and phenotype traits to better support drug discovery and next generation of treatment. In this study, we generated a normalized and scientific evidence supported cancer based PGx network (CPN) by integrating cancer related PGx information from multiple well-known PGx resources including the Pharmacogenomics Knowledge Base (PharmGKB), the FDA PGx Biomarkers in Drug Labeling, and the Catalog of Published Genome-Wide Association Studies (GWAS). We successfully demonstrated the capability of the CPN for drug repurposing by conducting two case studies. The CPN established in this study offers comprehensive cancer based PGx information to support cancer orientated research, especially for drug repurposing.
RNA regulatory networks in animals and plants: a long noncoding RNA perspective.
Bai, Youhuang; Dai, Xiaozhuan; Harrison, Andrew P; Chen, Ming
2015-03-01
A recent highlight of genomics research has been the discovery of many families of transcripts which have function but do not code for proteins. An important group is long noncoding RNAs (lncRNAs), which are typically longer than 200 nt, and whose members originate from thousands of loci across genomes. We review progress in understanding the biogenesis and regulatory mechanisms of lncRNAs. We describe diverse computational and high throughput technologies for identifying and studying lncRNAs. We discuss the current knowledge of functional elements embedded in lncRNAs as well as insights into the lncRNA-based regulatory network in animals. We also describe genome-wide studies of large amount of lncRNAs in plants, as well as knowledge of selected plant lncRNAs with a focus on biotic/abiotic stress-responsive lncRNAs. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
A whole-genome, radiation hybrid map of wheat
USDA-ARS?s Scientific Manuscript database
Generating a reference sequence of bread wheat (Triticum aestivum L.) is a challenging task because of its large, highly repetitive and allopolyploid genome. Ordering of BAC- and NGS-based contigs in ongoing wheat genome-sequencing projects primarily uses recombination and comparative genomics-base...
Evolutionary Dynamics of Small RNAs in 27 Escherichia coli and Shigella Genomes
Skippington, Elizabeth; Ragan, Mark A.
2012-01-01
Small RNAs (sRNAs) are widespread in bacteria and play critical roles in regulating physiological processes. They are best characterized in Escherichia coli K-12 MG1655, where 83 sRNAs constitute nearly 2% of the gene complement. Most sRNAs act by base pairing with a target mRNA, modulating its translation and/or stability; many of these RNAs share only limited complementarity to their mRNA target, and require the chaperone Hfq to facilitate base pairing. Little is known about the evolutionary dynamics of bacterial sRNAs. Here, we apply phylogenetic and network analyses to investigate the evolutionary processes and principles that govern sRNA gene distribution in 27 E. coli and Shigella genomes. We identify core (encoded in all 27 genomes) and variable sRNAs; more than two-thirds of the E. coli K-12 MG1655 sRNAs are core, whereas the others show patterns of presence and absence that are principally due to genetic loss, not duplication or lateral genetic transfer. We present evidence that variable sRNAs are less tightly integrated into cellular genetic regulatory networks than are the core sRNAs, and that Hfq facilitates posttranscriptional cross talk between the E. coli–Shigella core and variable genomes. Finally, we present evidence that more than 80% of genes targeted by Hfq-associated core sRNAs have been transferred within the E. coli–Shigella clade, and that most of these genes have been transferred intact. These results suggest that Hfq and sRNAs help integrate laterally acquired genes into established regulatory networks. PMID:22223756
Hamilton, Joshua J.; Dwivedi, Vivek; Reed, Jennifer L.
2013-01-01
Constraint-based methods provide powerful computational techniques to allow understanding and prediction of cellular behavior. These methods rely on physiochemical constraints to eliminate infeasible behaviors from the space of available behaviors. One such constraint is thermodynamic feasibility, the requirement that intracellular flux distributions obey the laws of thermodynamics. The past decade has seen several constraint-based methods that interpret this constraint in different ways, including those that are limited to small networks, rely on predefined reaction directions, and/or neglect the relationship between reaction free energies and metabolite concentrations. In this work, we utilize one such approach, thermodynamics-based metabolic flux analysis (TMFA), to make genome-scale, quantitative predictions about metabolite concentrations and reaction free energies in the absence of prior knowledge of reaction directions, while accounting for uncertainties in thermodynamic estimates. We applied TMFA to a genome-scale network reconstruction of Escherichia coli and examined the effect of thermodynamic constraints on the flux space. We also assessed the predictive performance of TMFA against gene essentiality and quantitative metabolomics data, under both aerobic and anaerobic, and optimal and suboptimal growth conditions. Based on these results, we propose that TMFA is a useful tool for validating phenotypes and generating hypotheses, and that additional types of data and constraints can improve predictions of metabolite concentrations. PMID:23870272
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.
Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian
2017-04-27
The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
Rutter, William B; Salcedo, Andres; Akhunova, Alina; He, Fei; Wang, Shichen; Liang, Hanquan; Bowden, Robert L; Akhunov, Eduard
2017-04-12
Two opposing evolutionary constraints exert pressure on plant pathogens: one to diversify virulence factors in order to evade plant defenses, and the other to retain virulence factors critical for maintaining a compatible interaction with the plant host. To better understand how the diversified arsenals of fungal genes promote interaction with the same compatible wheat line, we performed a comparative genomic analysis of two North American isolates of Puccinia graminis f. sp. tritici (Pgt). The patterns of inter-isolate divergence in the secreted candidate effector genes were compared with the levels of conservation and divergence of plant-pathogen gene co-expression networks (GCN) developed for each isolate. Comprative genomic analyses revealed substantial level of interisolate divergence in effector gene complement and sequence divergence. Gene Ontology (GO) analyses of the conserved and unique parts of the isolate-specific GCNs identified a number of conserved host pathways targeted by both isolates. Interestingly, the degree of inter-isolate sub-network conservation varied widely for the different host pathways and was positively associated with the proportion of conserved effector candidates associated with each sub-network. While different Pgt isolates tended to exploit similar wheat pathways for infection, the mode of plant-pathogen interaction varied for different pathways with some pathways being associated with the conserved set of effectors and others being linked with the diverged or isolate-specific effectors. Our data suggest that at the intra-species level pathogen populations likely maintain divergent sets of effectors capable of targeting the same plant host pathways. This functional redundancy may play an important role in the dynamic of the "arms-race" between host and pathogen serving as the basis for diverse virulence strategies and creating conditions where mutations in certain effector groups will not have a major effect on the pathogen's ability to infect the host.
The Global Genome Biodiversity Network (GGBN) Data Standard specification.
Droege, G; Barker, K; Seberg, O; Coddington, J; Benson, E; Berendsohn, W G; Bunk, B; Butler, C; Cawsey, E M; Deck, J; Döring, M; Flemons, P; Gemeinholzer, B; Güntsch, A; Hollowell, T; Kelbert, P; Kostadinov, I; Kottmann, R; Lawlor, R T; Lyal, C; Mackenzie-Dodds, J; Meyer, C; Mulcahy, D; Nussbeck, S Y; O'Tuama, É; Orrell, T; Petersen, G; Robertson, T; Söhngen, C; Whitacre, J; Wieczorek, J; Yilmaz, P; Zetzsche, H; Zhang, Y; Zhou, X
2016-01-01
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today's ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard.Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard. © The Author(s) 2016. Published by Oxford University Press.
Freytag, Saskia; Manitz, Juliane; Schlather, Martin; Kneib, Thomas; Amos, Christopher I.; Risch, Angela; Chang-Claude, Jenny; Heinrich, Joachim; Bickeböller, Heike
2014-01-01
Biological pathways provide rich information and biological context on the genetic causes of complex diseases. The logistic kernel machine test integrates prior knowledge on pathways in order to analyze data from genome-wide association studies (GWAS). Here, the kernel converts genomic information of two individuals to a quantitative value reflecting their genetic similarity. With the selection of the kernel one implicitly chooses a genetic effect model. Like many other pathway methods, none of the available kernels accounts for topological structure of the pathway or gene-gene interaction types. However, evidence indicates that connectivity and neighborhood of genes are crucial in the context of GWAS, because genes associated with a disease often interact. Thus, we propose a novel kernel that incorporates the topology of pathways and information on interactions. Using simulation studies, we demonstrate that the proposed method maintains the type I error correctly and can be more effective in the identification of pathways associated with a disease than non-network-based methods. We apply our approach to genome-wide association case control data on lung cancer and rheumatoid arthritis. We identify some promising new pathways associated with these diseases, which may improve our current understanding of the genetic mechanisms. PMID:24434848
Genome-scale model reveals metabolic basis of biomass partitioning in a model diatom
Levering, Jennifer; Broddrick, Jared; Dupont, Christopher L.; ...
2016-05-06
Diatoms are eukaryotic microalgae that contain genes from various sources, including bacteria and the secondary endosymbiotic host. Due to this unique combination of genes, diatoms are taxonomically and functionally distinct from other algae and vascular plants and confer novel metabolic capabilities. Based on the genome annotation, we performed a genome-scale metabolic network reconstruction for the marine diatom Phaeodactylum tricornutum. Due to their endosymbiotic origin, diatoms possess a complex chloroplast structure which complicates the prediction of subcellular protein localization. Based on previous work we implemented a pipeline that exploits a series of bioinformatics tools to predict protein localization. The manually curatedmore » reconstructed metabolic network iLB1027_lipid accounts for 1,027 genes associated with 4,456 reactions and 2,172 metabolites distributed across six compartments. To constrain the genome-scale model, we determined the organism specific biomass composition in terms of lipids, carbohydrates, and proteins using Fourier transform infrared spectrometry. Our simulations indicate the presence of a yet unknown glutamine-ornithine shunt that could be used to transfer reducing equivalents generated by photosynthesis to the mitochondria. Furthermore, the model reflects the known biochemical composition of P. tricornutum in defined culture conditions and enables metabolic engineering strategies to improve the use of P. tricornutum for biotechnological applications.« less
2013-01-01
Background We describe the genome of the western painted turtle, Chrysemys picta bellii, one of the most widespread, abundant, and well-studied turtles. We place the genome into a comparative evolutionary context, and focus on genomic features associated with tooth loss, immune function, longevity, sex differentiation and determination, and the species' physiological capacities to withstand extreme anoxia and tissue freezing. Results Our phylogenetic analyses confirm that turtles are the sister group to living archosaurs, and demonstrate an extraordinarily slow rate of sequence evolution in the painted turtle. The ability of the painted turtle to withstand complete anoxia and partial freezing appears to be associated with common vertebrate gene networks, and we identify candidate genes for future functional analyses. Tooth loss shares a common pattern of pseudogenization and degradation of tooth-specific genes with birds, although the rate of accumulation of mutations is much slower in the painted turtle. Genes associated with sex differentiation generally reflect phylogeny rather than convergence in sex determination functionality. Among gene families that demonstrate exceptional expansions or show signatures of strong natural selection, immune function and musculoskeletal patterning genes are consistently over-represented. Conclusions Our comparative genomic analyses indicate that common vertebrate regulatory networks, some of which have analogs in human diseases, are often involved in the western painted turtle's extraordinary physiological capacities. As these regulatory pathways are analyzed at the functional level, the painted turtle may offer important insights into the management of a number of human health disorders. PMID:23537068
Functional Interaction Network Construction and Analysis for Disease Discovery.
Wu, Guanming; Haw, Robin
2017-01-01
Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
Next-generation mammalian genetics toward organism-level systems biology.
Susaki, Etsuo A; Ukai, Hideki; Ueda, Hiroki R
2017-01-01
Organism-level systems biology in mammals aims to identify, analyze, control, and design molecular and cellular networks executing various biological functions in mammals. In particular, system-level identification and analysis of molecular and cellular networks can be accelerated by next-generation mammalian genetics. Mammalian genetics without crossing, where all production and phenotyping studies of genome-edited animals are completed within a single generation drastically reduce the time, space, and effort of conducting the systems research. Next-generation mammalian genetics is based on recent technological advancements in genome editing and developmental engineering. The process begins with introduction of double-strand breaks into genomic DNA by using site-specific endonucleases, which results in highly efficient genome editing in mammalian zygotes or embryonic stem cells. By using nuclease-mediated genome editing in zygotes, or ~100% embryonic stem cell-derived mouse technology, whole-body knock-out and knock-in mice can be produced within a single generation. These emerging technologies allow us to produce multiple knock-out or knock-in strains in high-throughput manner. In this review, we discuss the basic concepts and related technologies as well as current challenges and future opportunities for next-generation mammalian genetics in organism-level systems biology.
Identification of functional elements and regulatory circuits by Drosophila modENCODE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.
2010-12-22
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Nadon, Celine; Van Walle, Ivo; Gerner-Smidt, Peter; Campos, Josefina; Chinen, Isabel; Concepcion-Acevedo, Jeniffer; Gilpin, Brent; Smith, Anthony M.; Kam, Kai Man; Perez, Enrique; Trees, Eija; Kubota, Kristy; Takkinen, Johanna; Nielsen, Eva Møller; Carleton, Heather
2017-01-01
PulseNet International is a global network dedicated to laboratory-based surveillance for food-borne diseases. The network comprises the national and regional laboratory networks of Africa, Asia Pacific, Canada, Europe, Latin America and the Caribbean, the Middle East, and the United States. The PulseNet International vision is the standardised use of whole genome sequencing (WGS) to identify and subtype food-borne bacterial pathogens worldwide, replacing traditional methods to strengthen preparedness and response, reduce global social and economic disease burden, and save lives. To meet the needs of real-time surveillance, the PulseNet International network will standardise subtyping via WGS using whole genome multilocus sequence typing (wgMLST), which delivers sufficiently high resolution and epidemiological concordance, plus unambiguous nomenclature for the purposes of surveillance. Standardised protocols, validation studies, quality control programmes, database and nomenclature development, and training should support the implementation and decentralisation of WGS. Ideally, WGS data collected for surveillance purposes should be publicly available, in real time where possible, respecting data protection policies. WGS data are suitable for surveillance and outbreak purposes and for answering scientific questions pertaining to source attribution, antimicrobial resistance, transmission patterns, and virulence, which will further enable the protection and improvement of public health with respect to food-borne disease. PMID:28662764
NASA Astrophysics Data System (ADS)
Oh, Jung Hun; Kerns, Sarah; Ostrer, Harry; Powell, Simon N.; Rosenstein, Barry; Deasy, Joseph O.
2017-02-01
The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.
Krämer, Andreas; Shah, Sohela; Rebres, Robert Anthony; Tang, Susan; Richards, Daniel Rene
2017-08-11
Next-generation sequencing is widely used to identify disease-causing variants in patients with rare genetic disorders. Identifying those variants from whole-genome or exome data can be both scientifically challenging and time consuming. A significant amount of time is spent on variant annotation, and interpretation. Fully or partly automated solutions are therefore needed to streamline and scale this process. We describe Phenotype Driven Ranking (PDR), an algorithm integrated into Ingenuity Variant Analysis, that uses observed patient phenotypes to prioritize diseases and genes in order to expedite causal-variant discovery. Our method is based on a network of phenotype-disease-gene relationships derived from the QIAGEN Knowledge Base, which allows for efficient computational association of phenotypes to implicated diseases, and also enables scoring and ranking. We have demonstrated the utility and performance of PDR by applying it to a number of clinical rare-disease cases, where the true causal gene was known beforehand. It is also shown that PDR compares favorably to a representative alternative tool.
2010-01-01
Background Global profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years. Although many genome-wide studies have identified thousands of ERα binding sites and have revealed the associated transcription factor (TF) partners, such as AP1, FOXA1 and CEBP, little is known about ERα associated hierarchical transcriptional regulatory networks. Results In this study, we applied computational approaches to analyze three public available ChIP-based datasets: ChIP-seq, ChIP-PET and ChIP-chip, and to investigate the hierarchical regulatory network for ERα and ERα partner TFs regulation in estrogen-dependent breast cancer MCF7 cells. 16 common TFs and two common new TF partners (RORA and PITX2) were found among ChIP-seq, ChIP-chip and ChIP-PET datasets. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM). A permutation test was performed to test the reliability of each connection of the network. We then used DREM software to perform gene ontology function analysis on the common genes. We found that FOS, PITX2, RORA and FOXA1 were involved in the up-regulated genes. We also conducted the ERα and Pol-II ChIP-seq experiments in tamoxifen resistance MCF7 cells (denoted as MCF7-T in this study) and compared the difference between MCF7 and MCF7-T cells. The result showed very little overlap between these two cells in terms of targeted genes (21.2% of common genes) and targeted TFs (25% of common TFs). The significant dissimilarity may indicate totally different transcriptional regulatory mechanisms between these two cancer cells. Conclusions Our study uncovers new estrogen-mediated regulatory networks by mining three ChIP-based data in MCF7 cells and ChIP-seq data in MCF7-T cells. We compared the different ChIP-based technologies as well as different breast cancer cells. Our computational analytical approach may guide biologists to further study the underlying mechanisms in breast cancer cells or other human diseases. PMID:21167036
[Advance on genome research of Yersinia pestis bacteriophage].
Tan, H L; Wang, P; Li, W
2017-04-10
Completion of the genome sequences on Yersinia pestis bacteriophage offered unprecedented opportunity for researchers to carry out related genomic studies. This review was based on the genomic sequences and provided a genomic perspective in describing the essential features of genome on Yersinia pestis bacteriophage. Based on the comparative genomics, genetic evolutionary relationship was discussed. Description of functions from the gene prediction and protein annotation provided evidence for further related studies.
The Cancer Target Discovery and Development (CTD2) Network aims to use functional genomics to accelerate the translation of high-throughput and high-content genomic and small-molecule data towards use in precision oncology.
2014-01-01
Background At the beginning of the transcription process, the RNA polymerase (RNAP) core enzyme requires a σ-factor to recognize the genomic location at which the process initiates. Although the crucial role of σ-factors has long been appreciated and characterized for many individual promoters, we do not yet have a genome-scale assessment of their function. Results Using multiple genome-scale measurements, we elucidated the network of σ-factor and promoter interactions in Escherichia coli. The reconstructed network includes 4,724 σ-factor-specific promoters corresponding to transcription units (TUs), representing an increase of more than 300% over what has been previously reported. The reconstructed network was used to investigate competition between alternative σ-factors (the σ70 and σ38 regulons), confirming the competition model of σ substitution and negative regulation by alternative σ-factors. Comparison with σ-factor binding in Klebsiella pneumoniae showed that transcriptional regulation of conserved genes in closely related species is unexpectedly divergent. Conclusions The reconstructed network reveals the regulatory complexity of the promoter architecture in prokaryotic genomes, and opens a path to the direct determination of the systems biology of their transcriptional regulatory networks. PMID:24461193
Automation on the generation of genome-scale metabolic models.
Reyes, R; Gamermann, D; Montagud, A; Fuente, D; Triana, J; Urchueguía, J F; de Córdoba, P Fernández
2012-12-01
Nowadays, the reconstruction of genome-scale metabolic models is a nonautomatized and interactive process based on decision making. This lengthy process usually requires a full year of one person's work in order to satisfactory collect, analyze, and validate the list of all metabolic reactions present in a specific organism. In order to write this list, one manually has to go through a huge amount of genomic, metabolomic, and physiological information. Currently, there is no optimal algorithm that allows one to automatically go through all this information and generate the models taking into account probabilistic criteria of unicity and completeness that a biologist would consider. This work presents the automation of a methodology for the reconstruction of genome-scale metabolic models for any organism. The methodology that follows is the automatized version of the steps implemented manually for the reconstruction of the genome-scale metabolic model of a photosynthetic organism, Synechocystis sp. PCC6803. The steps for the reconstruction are implemented in a computational platform (COPABI) that generates the models from the probabilistic algorithms that have been developed. For validation of the developed algorithm robustness, the metabolic models of several organisms generated by the platform have been studied together with published models that have been manually curated. Network properties of the models, like connectivity and average shortest mean path of the different models, have been compared and analyzed.
Exploring metabolic pathways in genome-scale networks via generating flux modes.
Rezola, A; de Figueiredo, L F; Brock, M; Pey, J; Podhorski, A; Wittmann, C; Schuster, S; Bockmayr, A; Planes, F J
2011-02-15
The reconstruction of metabolic networks at the genome scale has allowed the analysis of metabolic pathways at an unprecedented level of complexity. Elementary flux modes (EFMs) are an appropriate concept for such analysis. However, their number grows in a combinatorial fashion as the size of the metabolic network increases, which renders the application of EFMs approach to large metabolic networks difficult. Novel methods are expected to deal with such complexity. In this article, we present a novel optimization-based method for determining a minimal generating set of EFMs, i.e. a convex basis. We show that a subset of elements of this convex basis can be effectively computed even in large metabolic networks. Our method was applied to examine the structure of pathways producing lysine in Escherichia coli. We obtained a more varied and informative set of pathways in comparison with existing methods. In addition, an alternative pathway to produce lysine was identified using a detour via propionyl-CoA, which shows the predictive power of our novel approach. The source code in C++ is available upon request.
A prior-based integrative framework for functional transcriptional regulatory network inference
Siahpirani, Alireza F.
2017-01-01
Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550
Moreira-Filho, Carlos Alberto; Bando, Silvia Yumi; Bertonha, Fernanda Bernardi; Silva, Filipi Nascimento; da Fontoura Costa, Luciano; Ferreira, Leandro Rodrigues; Furlanetto, Glaucio; Chacur, Paulo; Zerbini, Maria Claudia Nogueira; Carneiro-Sampaio, Magda
2016-01-01
Trisomy 21-driven transcriptional alterations in human thymus were characterized through gene coexpression network (GCN) and miRNA-target analyses. We used whole thymic tissue - obtained at heart surgery from Down syndrome (DS) and karyotipically normal subjects (CT) - and a network-based approach for GCN analysis that allows the identification of modular transcriptional repertoires (communities) and the interactions between all the system's constituents through community detection. Changes in the degree of connections observed for hierarchically important hubs/genes in CT and DS networks corresponded to community changes. Distinct communities of highly interconnected genes were topologically identified in these networks. The role of miRNAs in modulating the expression of highly connected genes in CT and DS was revealed through miRNA-target analysis. Trisomy 21 gene dysregulation in thymus may be depicted as the breakdown and altered reorganization of transcriptional modules. Leading networks acting in normal or disease states were identified. CT networks would depict the “canonical” way of thymus functioning. Conversely, DS networks represent a “non-canonical” way, i.e., thymic tissue adaptation under trisomy 21 genomic dysregulation. This adaptation is probably driven by epigenetic mechanisms acting at chromatin level and through the miRNA control of transcriptional programs involving the networks' high-hierarchy genes. PMID:26848775
Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock
2017-09-29
Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.
Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock
2017-01-01
Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients. PMID:29100405
Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology
Paley, Suzanne M.; Krummenacker, Markus; Latendresse, Mario; Dale, Joseph M.; Lee, Thomas J.; Kaipa, Pallavi; Gilham, Fred; Spaulding, Aaron; Popescu, Liviu; Altman, Tomer; Paulsen, Ian; Keseler, Ingrid M.; Caspi, Ron
2010-01-01
Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry. PMID:19955237
Wu, Chao; Xiong, Wei; Dai, Junbiao; ...
2014-12-15
We report that integrated and genome-based flux balance analysis, metabolomics, and 13C-label profiling of phototrophic and heterotrophic metabolism in Chlorella protothecoides, an oleaginous green alga for biofuel. The green alga Chlorella protothecoides, capable of autotrophic and heterotrophic growth with rapid lipid synthesis, is a promising candidate for biofuel production. Based on the newly available genome knowledge of the alga, we reconstructed the compartmentalized metabolic network consisting of 272 metabolic reactions, 270 enzymes, and 461 encoding genes and simulated the growth in different cultivation conditions with flux balance analysis. Phenotype-phase plane analysis shows conditions achieving theoretical maximum of the biomass andmore » corresponding fatty acid-producing rate for phototrophic cells (the ratio of photon uptake rate to CO 2 uptake rate equals 8.4) and heterotrophic ones (the glucose uptake rate to O 2 consumption rate reaches 2.4), respectively. Isotope-assisted liquid chromatography-mass spectrometry/mass spectrometry reveals higher metabolite concentrations in the glycolytic pathway and the tricarboxylic acid cycle in heterotrophic cells compared with autotrophic cells. We also observed enhanced levels of ATP, nicotinamide adenine dinucleotide (phosphate), reduced, acetyl-Coenzyme A, and malonyl-Coenzyme A in heterotrophic cells consistently, consistent with a strong activity of lipid synthesis. To profile the flux map in experimental conditions, we applied nonstationary 13C metabolic flux analysis as a complementing strategy to flux balance analysis. We found that the result reveals negligible photorespiratory fluxes and a metabolically low active tricarboxylic acid cycle in phototrophic C. protothecoides. In comparison, high throughput of amphibolic reactions and the tricarboxylic acid cycle with no glyoxylate shunt activities were measured for heterotrophic cells. Lastly, taken together, the metabolic network modeling assisted by experimental metabolomics and 13C labeling better our understanding on global metabolism of oleaginous alga, paving the way to the systematic engineering of the microalga for biofuel production.« less
Xu, Yu; Wang, Hong; Nussinov, Ruth; Ma, Buyong
2013-01-01
We constructed and simulated a ‘minimal proteome’ model using Langevin dynamics. It contains 206 essential protein types which were compiled from the literature. For comparison, we generated six proteomes with randomized concentrations. We found that the net charges and molecular weights of the proteins in the minimal genome are not random. The net charge of a protein decreases linearly with molecular weight, with small proteins being mostly positively charged and large proteins negatively charged. The protein copy numbers in the minimal genome have the tendency to maximize the number of protein-protein interactions in the network. Negatively charged proteins which tend to have larger sizes can provide large collision cross-section allowing them to interact with other proteins; on the other hand, the smaller positively charged proteins could have higher diffusion speed and are more likely to collide with other proteins. Proteomes with random charge/mass populations form less stable clusters than those with experimental protein copy numbers. Our study suggests that ‘proper’ populations of negatively and positively charged proteins are important for maintaining a protein-protein interaction network in a proteome. It is interesting to note that the minimal genome model based on the charge and mass of E. Coli may have a larger protein-protein interaction network than that based on the lower organism M. pneumoniae. PMID:23420643
The Verrucomicrobia LexA-Binding Motif: Insights into the Evolutionary Dynamics of the SOS Response.
Erill, Ivan; Campoy, Susana; Kılıç, Sefa; Barbé, Jordi
2016-01-01
The SOS response is the primary bacterial mechanism to address DNA damage, coordinating multiple cellular processes that include DNA repair, cell division, and translesion synthesis. In contrast to other regulatory systems, the composition of the SOS genetic network and the binding motif of its transcriptional repressor, LexA, have been shown to vary greatly across bacterial clades, making it an ideal system to study the co-evolution of transcription factors and their regulons. Leveraging comparative genomics approaches and prior knowledge on the core SOS regulon, here we define the binding motif of the Verrucomicrobia, a recently described phylum of emerging interest due to its association with eukaryotic hosts. Site directed mutagenesis of the Verrucomicrobium spinosum recA promoter confirms that LexA binds a 14 bp palindromic motif with consensus sequence TGTTC-N4-GAACA. Computational analyses suggest that recognition of this novel motif is determined primarily by changes in base-contacting residues of the third alpha helix of the LexA helix-turn-helix DNA binding motif. In conjunction with comparative genomics analysis of the LexA regulon in the Verrucomicrobia phylum, electrophoretic shift assays reveal that LexA binds to operators in the promoter region of DNA repair genes and a mutagenesis cassette in this organism, and identify previously unreported components of the SOS response. The identification of tandem LexA-binding sites generating instances of other LexA-binding motifs in the lexA gene promoter of Verrucomicrobia species leads us to postulate a novel mechanism for LexA-binding motif evolution. This model, based on gene duplication, successfully addresses outstanding questions in the intricate co-evolution of the LexA protein, its binding motif and the regulatory network it controls.
Regulatory variation: an emerging vantage point for cancer biology.
Li, Luolan; Lorzadeh, Alireza; Hirst, Martin
2014-01-01
Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes
González, Abel D.; Espinosa, Vladimir; Vasconcelos, Ana T.; Pérez-Rueda, Ernesto; Collado-Vides, Julio
2005-01-01
Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. TRACTOR_DB may be currently accessed at http://www.bioinfo.cu/Tractor_DB, http://www.tractor.lncc.br/ or at http://www.cifn.unam.mx/Computational_Genomics/tractorDB. Contact Email id is tractor@cifn.unam.mx. PMID:15608293
Kim, Hyun Soo
2018-01-01
Aged population is increasing worldwide due to the aging process that is inevitable. Accordingly, longevity and healthy aging have been spotlighted to promote social contribution of aged population. Many studies in the past few decades have reported the process of aging and longevity, emphasizing the importance of maintaining genomic stability in exceptionally long-lived population. Underlying reason of longevity remains unclear due to its complexity involving multiple factors. With advances in sequencing technology and human genome-associated approaches, studies based on population-based genomic studies are increasing. In this review, we summarize recent longevity and healthy aging studies of human population focusing on DNA repair as a major factor in maintaining genome integrity. To keep pace with recent growth in genomic research, aging- and longevity-associated genomic databases are also briefly introduced. To suggest novel approaches to investigate longevity-associated genetic variants related to DNA repair using genomic databases, gene set analysis was conducted, focusing on DNA repair- and longevity-associated genes. Their biological networks were additionally analyzed to grasp major factors containing genetic variants of human longevity and healthy aging in DNA repair mechanisms. In summary, this review emphasizes DNA repair activity in human longevity and suggests approach to conduct DNA repair-associated genomic study on human healthy aging.
Ponce-de-León, Miguel; Montero, Francisco; Peretó, Juli
2013-10-31
Metabolic reconstruction is the computational-based process that aims to elucidate the network of metabolites interconnected through reactions catalyzed by activities assigned to one or more genes. Reconstructed models may contain inconsistencies that appear as gap metabolites and blocked reactions. Although automatic methods for solving this problem have been previously developed, there are many situations where manual curation is still needed. We introduce a general definition of gap metabolite that allows its detection in a straightforward manner. Moreover, a method for the detection of Unconnected Modules, defined as isolated sets of blocked reactions connected through gap metabolites, is proposed. The method has been successfully applied to the curation of iCG238, the genome-scale metabolic model for the bacterium Blattabacterium cuenoti, obligate endosymbiont of cockroaches. We found the proposed approach to be a valuable tool for the curation of genome-scale metabolic models. The outcome of its application to the genome-scale model B. cuenoti iCG238 is a more accurate model version named as B. cuenoti iMP240.
NASA Astrophysics Data System (ADS)
di Bernardo, Diego
2016-07-01
The review by Martin et al. deals with a long standing problem at the interface of complex systems and molecular biology, that is the relationship between the topology of a complex network and its function. In biological terms the problem translates to relating the topology of gene regulatory networks (GRNs) to specific cellular functions. GRNs control the spatial and temporal activity of the genes encoded in the cell's genome by means of specialised proteins called Transcription Factors (TFs). A TF is able to recognise and bind specifically to a sequence (TF biding site) of variable length (order of magnitude of 10) found upstream of the sequence encoding one or more genes (at least in prokaryotes) and thus activating or repressing their transcription. TFs can thus be distinguished in activator and repressor. The picture can become more complex since some classes of TFs can form hetero-dimers consisting of a protein complex whose subunits are the individual TFs. Heterodimers can have completely different binding sites and activity compared to their individual parts. In this review the authors limit their attention to prokaryotes where the complexity of GRNs is somewhat reduced. Moreover they exploit a unique feature of living systems, i.e. evolution, to understand whether function can shape network topology. Indeed, prokaryotes such as bacteria are among the oldest living systems that have become perfectly adapted to their environment over geological scales and thus have reached an evolutionary steady-state where the fitness of the population has reached a plateau. By integrating in silico analysis and comparative evolution, the authors show that indeed function does tend to shape the structure of a GRN, however this trend is not always present and depends on the properties of the network being examined. Interestingly, the trend is more apparent for sparse networks, i.e. where the density of edges is very low. Sparsity is indeed one of the most prominent features of natural occurring GRNs, and more specifically GRNs have been found to approximate a power-law ;scale-free; degree distribution by Barabasi and Albert [2]. Why sparsity arises is still under debate, but Price in 1976 proposed a model [1], later renamed ;preferential attachment; by Barabasi and Albert [2], able to give rise to sparse scale-free networks. In this model, a network grows over time (such as GRN during evolution) by sequential addition of new nodes (caused by genome duplications) that attach with higher probability to nodes with higher degree. In this review, Martin et al. propose that sparsity could also be caused phenotypic constrains even in the absence of genome duplications, in order for the network to be robust against random mutations in the genome sequence, which in turn affect the specificity of TF binding sites. The authors also found that network motifs, i.e. subnetworks consisting of 3 or 4 nodes with a specific topology that are over-represented in the network, are also shaped by phenotypic constrains. Theoretical and computational approaches to understand the forces that shape network topology are of extreme interest in biology, although at this stage their impact has been limited. Neverteless, these approaches may soon have important practical applications. The era of synthetic biology is upon us, novel organisms with ;minimal genomes; are being built with the dual aim of simplifying engineering of new functions useful to humans and to understand which is the minimal set of genes needed to support life [3]. The first minimal organism has just been created [3] by randomly deleting genes and genomic regions until a minimal set supporting cell growth and replication was found. The GRN of this minimal organism has not been investigated yet, but it will be of limited complexity. What is the GRN structure in this organism? Will the cell phenotypes be robust to mutations? Is it possible to re-engineer the GRN in order to find an optimal structure that confers phenotypic robustness to the cell? All of these questions can be tackled only by understanding the guiding principles linking network topology to network function.
Goddard, Katrina A.B.; Knaus, William A.; Whitlock, Evelyn; Lyman, Gary H.; Feigelson, Heather Spencer; Schully, Sheri D.; Ramsey, Scott; Tunis, Sean; Freedman, Andrew N.; Khoury, Muin J.; Veenstra, David L.
2013-01-01
Background The clinical utility is uncertain for many cancer genomic applications. Comparative effectiveness research (CER) can provide evidence to clarify this uncertainty. Objectives To identify approaches to help stakeholders make evidence-based decisions, and to describe potential challenges and opportunities using CER to produce evidence-based guidance. Methods We identified general CER approaches for genomic applications through literature review, the authors’ experiences, and lessons learned from a recent, seven-site CER initiative in cancer genomic medicine. Case studies illustrate the use of CER approaches. Results Evidence generation and synthesis approaches include comparative observational and randomized trials, patient reported outcomes, decision modeling, and economic analysis. We identified significant challenges to conducting CER in cancer genomics: the rapid pace of innovation, the lack of regulation, the limited evidence for clinical utility, and the beliefs that genomic tests could have personal utility without having clinical utility. Opportunities to capitalize on CER methods in cancer genomics include improvements in the conduct of evidence synthesis, stakeholder engagement, increasing the number of comparative studies, and developing approaches to inform clinical guidelines and research prioritization. Conclusions CER offers a variety of methodological approaches to address stakeholders’ needs. Innovative approaches are needed to ensure an effective translation of genomic discoveries. PMID:22516979
Heydt, C; Kostenko, A; Merkelbach-Bruse, S; Wolf, J; Büttner, R
2016-09-01
Comprehensive molecular genotyping of lung cancers has become a key requirement for guiding therapeutic decisions. As a paradigm model of implementing next-generation comprehensive diagnostics, Network Genomic Medicine (NGM) has established central diagnostic and clinical trial platforms for centralised testing and decentralised personalised treatment in clinical practice. Here, we describe the structures of the NGM network and give a summary of technologies to identify patients with anaplastic lymphoma kinase (ALK) fusion-positive lung adenocarcinomas. As unifying test platforms will become increasingly important for delivering reliable, quick and affordable tests, the NGM diagnostic platform is currently implementing a comprehensive hybrid capture-based parallel sequencing pan-cancer assay. © The Author 2016. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Laniau, Julie; Frioux, Clémence; Nicolas, Jacques; Baroukh, Caroline; Cortes, Maria-Paz; Got, Jeanne; Trottier, Camille; Eveillard, Damien; Siegel, Anne
2017-01-01
The emergence of functions in biological systems is a long-standing issue that can now be addressed at the cell level with the emergence of high throughput technologies for genome sequencing and phenotyping. The reconstruction of complete metabolic networks for various organisms is a key outcome of the analysis of these data, giving access to a global view of cell functioning. The analysis of metabolic networks may be carried out by simply considering the architecture of the reaction network or by taking into account the stoichiometry of reactions. In both approaches, this analysis is generally centered on the outcome of the network and considers all metabolic compounds to be equivalent in this respect. As in the case of genes and reactions, about which the concept of essentiality has been developed, it seems, however, that some metabolites play crucial roles in system responses, due to the cell structure or the internal wiring of the metabolic network. We propose a classification of metabolic compounds according to their capacity to influence the activation of targeted functions (generally the growth phenotype) in a cell. We generalize the concept of essentiality to metabolites and introduce the concept of the phenotypic essential metabolite (PEM) which influences the growth phenotype according to sustainability, producibility or optimal-efficiency criteria. We have developed and made available a tool, Conquests , which implements a method combining graph-based and flux-based analysis, two approaches that are usually considered separately. The identification of PEMs is made effective by using a logical programming approach. The exhaustive study of phenotypic essential metabolites in six genome-scale metabolic models suggests that the combination and the comparison of graph, stoichiometry and optimal flux-based criteria allows some features of the metabolic network functionality to be deciphered by focusing on a small number of compounds. By considering the best combination of both graph-based and flux-based techniques, the Conquests python package advocates for a broader use of these compounds both to facilitate network curation and to promote a precise understanding of metabolic phenotype.
Frioux, Clémence; Nicolas, Jacques; Baroukh, Caroline; Cortes, Maria-Paz; Got, Jeanne; Trottier, Camille; Eveillard, Damien
2017-01-01
Background The emergence of functions in biological systems is a long-standing issue that can now be addressed at the cell level with the emergence of high throughput technologies for genome sequencing and phenotyping. The reconstruction of complete metabolic networks for various organisms is a key outcome of the analysis of these data, giving access to a global view of cell functioning. The analysis of metabolic networks may be carried out by simply considering the architecture of the reaction network or by taking into account the stoichiometry of reactions. In both approaches, this analysis is generally centered on the outcome of the network and considers all metabolic compounds to be equivalent in this respect. As in the case of genes and reactions, about which the concept of essentiality has been developed, it seems, however, that some metabolites play crucial roles in system responses, due to the cell structure or the internal wiring of the metabolic network. Results We propose a classification of metabolic compounds according to their capacity to influence the activation of targeted functions (generally the growth phenotype) in a cell. We generalize the concept of essentiality to metabolites and introduce the concept of the phenotypic essential metabolite (PEM) which influences the growth phenotype according to sustainability, producibility or optimal-efficiency criteria. We have developed and made available a tool, Conquests, which implements a method combining graph-based and flux-based analysis, two approaches that are usually considered separately. The identification of PEMs is made effective by using a logical programming approach. Conclusion The exhaustive study of phenotypic essential metabolites in six genome-scale metabolic models suggests that the combination and the comparison of graph, stoichiometry and optimal flux-based criteria allows some features of the metabolic network functionality to be deciphered by focusing on a small number of compounds. By considering the best combination of both graph-based and flux-based techniques, the Conquests python package advocates for a broader use of these compounds both to facilitate network curation and to promote a precise understanding of metabolic phenotype. PMID:29038751
Molecular networks and the evolution of human cognitive specializations.
Fontenot, Miles; Konopka, Genevieve
2014-12-01
Inroads into elucidating the origins of human cognitive specializations have taken many forms, including genetic, genomic, anatomical, and behavioral assays that typically compare humans to non-human primates. While the integration of all of these approaches is essential for ultimately understanding human cognition, here, we review the usefulness of coexpression network analysis for specifically addressing this question. An increasing number of studies have incorporated coexpression networks into brain expression studies comparing species, disease versus control tissue, brain regions, or developmental time periods. A clearer picture has emerged of the key genes driving brain evolution, as well as the developmental and regional contributions of gene expression patterns important for normal brain development and those misregulated in cognitive diseases. Copyright © 2014 Elsevier Ltd. All rights reserved.
GenColors-based comparative genome databases for small eukaryotic genomes.
Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot
2013-01-01
Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.
Novel integrative genomic tool for interrogating lithium response in bipolar disorder
Hunsberger, J G; Chibane, F L; Elkahloun, A G; Henderson, R; Singh, R; Lawson, J; Cruceanu, C; Nagarajan, V; Turecki, G; Squassina, A; Medeiros, C D; Del Zompo, M; Rouleau, G A; Alda, M; Chuang, D-M
2015-01-01
We developed a novel integrative genomic tool called GRANITE (Genetic Regulatory Analysis of Networks Investigational Tool Environment) that can effectively analyze large complex data sets to generate interactive networks. GRANITE is an open-source tool and invaluable resource for a variety of genomic fields. Although our analysis is confined to static expression data, GRANITE has the capability of evaluating time-course data and generating interactive networks that may shed light on acute versus chronic treatment, as well as evaluating dose response and providing insight into mechanisms that underlie therapeutic versus sub-therapeutic doses or toxic doses. As a proof-of-concept study, we investigated lithium (Li) response in bipolar disorder (BD). BD is a severe mood disorder marked by cycles of mania and depression. Li is one of the most commonly prescribed and decidedly effective treatments for many patients (responders), although its mode of action is not yet fully understood, nor is it effective in every patient (non-responders). In an in vitro study, we compared vehicle versus chronic Li treatment in patient-derived lymphoblastoid cells (LCLs) (derived from either responders or non-responders) using both microRNA (miRNA) and messenger RNA gene expression profiling. We present both Li responder and non-responder network visualizations created by our GRANITE analysis in BD. We identified by network visualization that the Let-7 family is consistently downregulated by Li in both groups where this miRNA family has been implicated in neurodegeneration, cell survival and synaptic development. We discuss the potential of this analysis for investigating treatment response and even providing clinicians with a tool for predicting treatment response in their patients, as well as for providing the industry with a tool for identifying network nodes as targets for novel drug discovery. PMID:25646593
Novel integrative genomic tool for interrogating lithium response in bipolar disorder.
Hunsberger, J G; Chibane, F L; Elkahloun, A G; Henderson, R; Singh, R; Lawson, J; Cruceanu, C; Nagarajan, V; Turecki, G; Squassina, A; Medeiros, C D; Del Zompo, M; Rouleau, G A; Alda, M; Chuang, D-M
2015-02-03
We developed a novel integrative genomic tool called GRANITE (Genetic Regulatory Analysis of Networks Investigational Tool Environment) that can effectively analyze large complex data sets to generate interactive networks. GRANITE is an open-source tool and invaluable resource for a variety of genomic fields. Although our analysis is confined to static expression data, GRANITE has the capability of evaluating time-course data and generating interactive networks that may shed light on acute versus chronic treatment, as well as evaluating dose response and providing insight into mechanisms that underlie therapeutic versus sub-therapeutic doses or toxic doses. As a proof-of-concept study, we investigated lithium (Li) response in bipolar disorder (BD). BD is a severe mood disorder marked by cycles of mania and depression. Li is one of the most commonly prescribed and decidedly effective treatments for many patients (responders), although its mode of action is not yet fully understood, nor is it effective in every patient (non-responders). In an in vitro study, we compared vehicle versus chronic Li treatment in patient-derived lymphoblastoid cells (LCLs) (derived from either responders or non-responders) using both microRNA (miRNA) and messenger RNA gene expression profiling. We present both Li responder and non-responder network visualizations created by our GRANITE analysis in BD. We identified by network visualization that the Let-7 family is consistently downregulated by Li in both groups where this miRNA family has been implicated in neurodegeneration, cell survival and synaptic development. We discuss the potential of this analysis for investigating treatment response and even providing clinicians with a tool for predicting treatment response in their patients, as well as for providing the industry with a tool for identifying network nodes as targets for novel drug discovery.
A candidate multimodal functional genetic network for thermal adaptation
Pathak, Rachana; Prajapati, Indira; Bankston, Shannon; Thompson, Aprylle; Usher, Jaytriece; Isokpehi, Raphael D.
2014-01-01
Vertebrate ectotherms such as reptiles provide ideal organisms for the study of adaptation to environmental thermal change. Comparative genomic and exomic studies can recover markers that diverge between warm and cold adapted lineages, but the genes that are functionally related to thermal adaptation may be difficult to identify. We here used a bioinformatics genome-mining approach to predict and identify functions for suitable candidate markers for thermal adaptation in the chicken. We first established a framework of candidate functions for such markers, and then compiled the literature on genes known to adapt to the thermal environment in different lineages of vertebrates. We then identified them in the genomes of human, chicken, and the lizard Anolis carolinensis, and established a functional genetic interaction network in the chicken. Surprisingly, markers initially identified from diverse lineages of vertebrates such as human and fish were all in close functional relationship with each other and more associated than expected by chance. This indicates that the general genetic functional network for thermoregulation and/or thermal adaptation to the environment might be regulated via similar evolutionarily conserved pathways in different vertebrate lineages. We were able to identify seven functions that were statistically overrepresented in this network, corresponding to four of our originally predicted functions plus three unpredicted functions. We describe this network as multimodal: central regulator genes with the function of relaying thermal signal (1), affect genes with different cellular functions, namely (2) lipoprotein metabolism, (3) membrane channels, (4) stress response, (5) response to oxidative stress, (6) muscle contraction and relaxation, and (7) vasodilation, vasoconstriction and regulation of blood pressure. This network constitutes a novel resource for the study of thermal adaptation in the closely related nonavian reptiles and other vertebrate ectotherms. PMID:25289178
Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou
2011-11-01
Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.
Genomics Portals: integrative web-platform for mining genomics data.
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario
2010-01-13
A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Genomics Portals: integrative web-platform for mining genomics data
2010-01-01
Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909
PINTA: a web server for network-based gene prioritization from expression data
Nitsch, Daniela; Tranchevent, Léon-Charles; Gonçalves, Joana P.; Vogt, Josef Korbinian; Madeira, Sara C.; Moreau, Yves
2011-01-01
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user. PMID:21602267
Levering, Jennifer; Dupont, Christopher L.; Allen, Andrew E.; ...
2017-02-14
Diatoms are eukaryotic microalgae that are responsible for up to 40% of the ocean’s primary productivity. How diatoms respond to environmental perturbations such as elevated carbon concentrations in the atmosphere is currently poorly understood. We developed a transcriptional regulatory network based on various transcriptome sequencing expression libraries for different environmental responses to gain insight into the marine diatom’s metabolic and regulatory interactions and provide a comprehensive framework of responses to increasing atmospheric carbon levels. This transcriptional regulatory network was integrated with a recently published genome-scale metabolic model of Phaeodactylum tricornutum to explore the connectivity of the regulatory network and sharedmore » metabolites. The integrated regulatory and metabolic model revealed highly connected modules within carbon and nitrogen metabolism. P. tricornutum’s response to rising carbon levels was analyzed by using the recent genome-scale metabolic model with cross comparison to experimental manipulations of carbon dioxide. Using a systems biology approach, we studied the response of the marine diatom Phaeodactylum tricornutum to changing atmospheric carbon concentrations on an ocean-wide scale. By integrating an available genome-scale metabolic model and a newly developed transcriptional regulatory network inferred from transcriptome sequencing expression data, we demonstrate that carbon metabolism and nitrogen metabolism are strongly connected and the genes involved are coregulated in this model diatom. These tight regulatory constraints could play a major role during the adaptation of P. tricornutum to increasing carbon levels. The transcriptional regulatory network developed can be further used to study the effects of different environmental perturbations on P. tricornutum’s metabolism.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levering, Jennifer; Dupont, Christopher L.; Allen, Andrew E.
Diatoms are eukaryotic microalgae that are responsible for up to 40% of the ocean’s primary productivity. How diatoms respond to environmental perturbations such as elevated carbon concentrations in the atmosphere is currently poorly understood. We developed a transcriptional regulatory network based on various transcriptome sequencing expression libraries for different environmental responses to gain insight into the marine diatom’s metabolic and regulatory interactions and provide a comprehensive framework of responses to increasing atmospheric carbon levels. This transcriptional regulatory network was integrated with a recently published genome-scale metabolic model of Phaeodactylum tricornutum to explore the connectivity of the regulatory network and sharedmore » metabolites. The integrated regulatory and metabolic model revealed highly connected modules within carbon and nitrogen metabolism. P. tricornutum’s response to rising carbon levels was analyzed by using the recent genome-scale metabolic model with cross comparison to experimental manipulations of carbon dioxide. Using a systems biology approach, we studied the response of the marine diatom Phaeodactylum tricornutum to changing atmospheric carbon concentrations on an ocean-wide scale. By integrating an available genome-scale metabolic model and a newly developed transcriptional regulatory network inferred from transcriptome sequencing expression data, we demonstrate that carbon metabolism and nitrogen metabolism are strongly connected and the genes involved are coregulated in this model diatom. These tight regulatory constraints could play a major role during the adaptation of P. tricornutum to increasing carbon levels. The transcriptional regulatory network developed can be further used to study the effects of different environmental perturbations on P. tricornutum’s metabolism.« less
Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei
2018-01-01
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Iranzo, Jaime; Koonin, Eugene V; Prangishvili, David; Krupovic, Mart
2016-12-15
Archaea and particularly hyperthermophilic crenarchaea are hosts to many unusual viruses with diverse virion shapes and distinct gene compositions. As is typical of viruses in general, there are no universal genes in the archaeal virosphere. Therefore, to obtain a comprehensive picture of the evolutionary relationships between viruses, network analysis methods are more productive than traditional phylogenetic approaches. Here we present a comprehensive comparative analysis of genomes and proteomes from all currently known taxonomically classified and unclassified, cultivated and uncultivated archaeal viruses. We constructed a bipartite network of archaeal viruses that includes two classes of nodes, the genomes and gene families that connect them. Dissection of this network using formal community detection methods reveals strong modularity, with 10 distinct modules and 3 putative supermodules. However, compared to similar previously analyzed networks of eukaryotic and bacterial viruses, the archaeal virus network is sparsely connected. With the exception of the tailed viruses related to bacteriophages of the order Caudovirales and the families Turriviridae and Sphaerolipoviridae that are linked to a distinct supermodule of eukaryotic and bacterial viruses, there are few connector genes shared by different archaeal virus modules. In contrast, most of these modules include, in addition to viruses, capsidless mobile elements, emphasizing tight evolutionary connections between the two types of entities in archaea. The relative contributions of distinct evolutionary origins, in particular from nonviral elements, and insufficient sampling to the sparsity of the archaeal virus network remain to be determined by further exploration of the archaeal virosphere. Viruses infecting archaea are among the most mysterious denizens of the virosphere. Many of these viruses display no genetic or even morphological relationship to viruses of bacteria and eukaryotes, raising questions regarding their origins and position in the global virosphere. Analysis of 5,740 protein sequences from 116 genomes allowed dissection of the archaeal virus network and showed that most groups of archaeal viruses are evolutionarily connected to capsidless mobile genetic elements, including various plasmids and transposons. This finding could reflect actual independent origins of the distinct groups of archaeal viruses from different nonviral elements, providing important insights into the emergence and evolution of the archaeal virome. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Gu, Yunyan; Wang, Hongwei; Qin, Yao; Zhang, Yujing; Zhao, Wenyuan; Qi, Lishuang; Zhang, Yuannv; Wang, Chenguang; Guo, Zheng
2013-03-01
The heterogeneity of genetic alterations in human cancer genomes presents a major challenge to advancing our understanding of cancer mechanisms and identifying cancer driver genes. To tackle this heterogeneity problem, many approaches have been proposed to investigate genetic alterations and predict driver genes at the individual pathway level. However, most of these approaches ignore the correlation of alteration events between pathways and miss many genes with rare alterations collectively contributing to carcinogenesis. Here, we devise a network-based approach to capture the cooperative functional modules hidden in genome-wide somatic mutation and copy number alteration profiles of glioblastoma (GBM) from The Cancer Genome Atlas (TCGA), where a module is a set of altered genes with dense interactions in the protein interaction network. We identify 7 pairs of significantly co-altered modules that involve the main pathways known to be altered in GBM (TP53, RB and RTK signaling pathways) and highlight the striking co-occurring alterations among these GBM pathways. By taking into account the non-random correlation of gene alterations, the property of co-alteration could distinguish oncogenic modules that contain driver genes involved in the progression of GBM. The collaboration among cancer pathways suggests that the redundant models and aggravating models could shed new light on the potential mechanisms during carcinogenesis and provide new indications for the design of cancer therapeutic strategies.
Schrock, Alexa B; Li, Shuyu D; Frampton, Garrett M; Suh, James; Braun, Eduardo; Mehra, Ranee; Buck, Steven C; Bufill, Jose A; Peled, Nir; Karim, Nagla Abdel; Hsieh, K Cynthia; Doria, Manuel; Knost, James; Chen, Rong; Ou, Sai-Hong Ignatius; Ross, Jeffrey S; Stephens, Philip J; Fishkin, Paul; Miller, Vincent A; Ali, Siraj M; Halmos, Balazs; Liu, Jane J
2017-06-01
Pulmonary sarcomatoid carcinoma (PSC) is a high-grade NSCLC characterized by poor prognosis and resistance to chemotherapy. Development of targeted therapeutic strategies for PSC has been hampered because of limited and inconsistent molecular characterization. Hybrid capture-based comprehensive genomic profiling was performed on DNA from formalin-fixed paraffin-embedded sections of 15,867 NSCLCs, including 125 PSCs (0.8%). Tumor mutational burden (TMB) was calculated from 1.11 megabases (Mb) of sequenced DNA. The median age of the patients with PSC was 67 years (range 32-87), 58% were male, and 78% had stage IV disease. Tumor protein p53 gene (TP53) genomic alterations (GAs) were identified in 74% of cases, which had genomics distinct from TP53 wild-type cases, and 62% featured a GA in KRAS (34%) or one of seven genes currently recommended for testing in the National Comprehensive Cancer Network NSCLC guidelines, including the following: hepatocyte growth factor receptor gene (MET) (13.6%), EGFR (8.8%), BRAF (7.2%), erb-b2 receptor tyrosine kinase 2 gene (HER2) (1.6%), and ret proto-oncogene (RET) (0.8%). MET exon 14 alterations were enriched in PSC (12%) compared with non-PSC NSCLCs (∼3%) (p < 0.0001) and were more prevalent in PSC cases with an adenocarcinoma component. The fraction of PSC with a high TMB (>20 mutations per Mb) was notably higher than in non-PSC NSCLC (20% versus 14%, p = 0.056). Of nine patients with PSC treated with targeted or immunotherapies, three had partial responses and three had stable disease. Potentially targetable GAs in National Comprehensive Cancer Network NSCLC genes (30%) or intermediate or high TMB (43%, >10 mutations per Mb) were identified in most of the PSC cases. Thus, the use of comprehensive genomic profiling in clinical care may provide important treatment options for a historically poorly characterized and difficult to treat disease. Copyright © 2017 International Association for the Study of Lung Cancer. Published by Elsevier Inc. All rights reserved.
CoryneBase: Corynebacterium Genomic Resources and Analysis Tools at Your Fingertips
Tan, Mui Fern; Jakubovics, Nick S.; Wee, Wei Yee; Mutha, Naresh V. R.; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam; Choo, Siew Woh
2014-01-01
Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/. PMID:24466021
Eastman, Alexander W; Heinrichs, David E; Yuan, Ze-Chun
2014-10-03
Members of the genus Paenibacillus are important plant growth-promoting rhizobacteria that can serve as bio-reactors. Paenibacillus polymyxa promotes the growth of a variety of economically important crops. Our lab recently completed the genome sequence of Paenibacillus polymyxa CR1. As of January 2014, four P. polymyxa genomes have been completely sequenced but no comparative genomic analyses have been reported. Here we report the comparative and genetic analyses of four sequenced P. polymyxa genomes, which revealed a significantly conserved core genome. Complex metabolic pathways and regulatory networks were highly conserved and allow P. polymyxa to rapidly respond to dynamic environmental cues. Genes responsible for phytohormone synthesis, phosphate solubilization, iron acquisition, transcriptional regulation, σ-factors, stress responses, transporters and biomass degradation were well conserved, indicating an intimate association with plant hosts and the rhizosphere niche. In addition, genes responsible for antimicrobial resistance and non-ribosomal peptide/polyketide synthesis are present in both the core and accessory genome of each strain. Comparative analyses also reveal variations in the accessory genome, including large plasmids present in strains M1 and SC2. Furthermore, a considerable number of strain-specific genes and genomic islands are irregularly distributed throughout each genome. Although a variety of plant-growth promoting traits are encoded by all strains, only P. polymyxa CR1 encodes the unique nitrogen fixation cluster found in other Paenibacillus sp. Our study revealed that genomic loci relevant to host interaction and ecological fitness are highly conserved within the P. polymyxa genomes analysed, despite variations in the accessory genome. This work suggets that plant-growth promotion by P. polymyxa is mediated largely through phytohormone production, increased nutrient availability and bio-control mechanisms. This study provides an in-depth understanding of the genome architecture of this species, thus facilitating future genetic engineering and applications in agriculture, industry and medicine. Furthermore, this study highlights the current gap in our understanding of complex plant biomass metabolism in Gram-positive bacteria.
Thomashow, Mike
2018-02-06
The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy & Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Mike Thomashow of Michigan State University gives a presentation on on "Low Temperature Regulatory Networks Controlling Cold Acclimation in Arabidopsis" at the 6th annual Genomics of Energy & Environment Meeting on March 23, 2011."
FCDECOMP: decomposition of metabolic networks based on flux coupling relations.
Rezvan, Abolfazl; Marashi, Sayed-Amir; Eslahchi, Changiz
2014-10-01
A metabolic network model provides a computational framework to study the metabolism of a cell at the system level. Due to their large sizes and complexity, rational decomposition of these networks into subsystems is a strategy to obtain better insight into the metabolic functions. Additionally, decomposing metabolic networks paves the way to use computational methods that will be otherwise very slow when run on the original genome-scale network. In the present study, we propose FCDECOMP decomposition method based on flux coupling relations (FCRs) between pairs of reaction fluxes. This approach utilizes a genetic algorithm (GA) to obtain subsystems that can be analyzed in isolation, i.e. without considering the reactions of the original network in the analysis. Therefore, we propose that our method is useful for discovering biologically meaningful modules in metabolic networks. As a case study, we show that when this method is applied to the metabolic networks of barley seeds and yeast, the modules are in good agreement with the biological compartments of these networks.
NASA Astrophysics Data System (ADS)
Tkačik, Gašper
2016-07-01
The article by O. Martin and colleagues provides a much needed systematic review of a body of work that relates the topological structure of genetic regulatory networks to evolutionary selection for function. This connection is very important. Using the current wealth of genomic data, statistical features of regulatory networks (e.g., degree distributions, motif composition, etc.) can be quantified rather easily; it is, however, often unclear how to interpret the results. On a graph theoretic level the statistical significance of the results can be evaluated by comparing observed graphs to ;randomized; ones (bravely ignoring the issue of how precisely to randomize!) and comparing the frequency of appearance of a particular network structure relative to a randomized null expectation. While this is a convenient operational test for statistical significance, its biological meaning is questionable. In contrast, an in-silico genotype-to-phenotype model makes explicit the assumptions about the network function, and thus clearly defines the expected network structures that can be compared to the case of no selection for function and, ultimately, to data.
Merlet, Benjamin; Paulhe, Nils; Vinson, Florence; Frainay, Clément; Chazalviel, Maxime; Poupin, Nathalie; Gloaguen, Yoann; Giacomoni, Franck; Jourdan, Fabien
2016-01-01
This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.
Contemporary Network Proteomics and Its Requirements
Goh, Wilson Wen Bin; Wong, Limsoon; Sng, Judy Chia Ghee
2013-01-01
The integration of networks with genomics (network genomics) is a familiar field. Conventional network analysis takes advantage of the larger coverage and relative stability of gene expression measurements. Network proteomics on the other hand has to develop further on two critical factors: (1) expanded data coverage and consistency, and (2) suitable reference network libraries, and data mining from them. Concerning (1) we discuss several contemporary themes that can improve data quality, which in turn will boost the outcome of downstream network analysis. For (2), we focus on network analysis developments, specifically, the need for context-specific networks and essential considerations for localized network analysis. PMID:24833333
Lee, Insuk; Li, Zhihua; Marcotte, Edward M.
2007-01-01
Background Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org. PMID:17912365
Knowledge-driven genomic interactions: an application in ovarian cancer.
Kim, Dokyoon; Li, Ruowang; Dudek, Scott M; Frase, Alex T; Pendergrass, Sarah A; Ritchie, Marylyn D
2014-01-01
Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.
USDA-ARS?s Scientific Manuscript database
The Collaborative African Genomics Network (CAfGEN) aims to establish sustainable genomics research programs in Botswana and Uganda through long-term training of PhD students from these countries at Baylor College of Medicine. Here, we present an overview of the CAfGEN PhD training program alongside...
Prediction of Ras-effector interactions using position energy matrices.
Kiel, Christina; Serrano, Luis
2007-09-01
One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.
The Cancer Genome Atlas Pan-Cancer Analysis Project
Weinstein, John N.; Collisson, Eric A.; Mills, Gordon B.; Shaw, Kenna M.; Ozenberger, Brad A.; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M.
2014-01-01
Cancer can take hundreds of different forms depending on the location, cell of origin and spectrum of genomic alterations that promote oncogenesis and affect therapeutic response. Although many genomic events with direct phenotypic impact have been identified, much of the complex molecular landscape remains incompletely charted for most cancer lineages. For that reason, The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumours to discover molecular aberrations at the DNA, RNA, protein, and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences, and emergent themes across tumour lineages. The Pan-Cancer initiative compares the first twelve tumour types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumour types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile. PMID:24071849
Transcriptome analysis and related databases of Lactococcus lactis.
Kuipers, Oscar P; de Jong, Anne; Baerends, Richard J S; van Hijum, Sacha A F T; Zomer, Aldert L; Karsens, Harma A; den Hengst, Chris D; Kramer, Naomi E; Buist, Girbe; Kok, Jan
2002-08-01
Several complete genome sequences of Lactococcus lactis and their annotations will become available in the near future, next to the already published genome sequence of L. lactis ssp. lactis IL 1403. This will allow intraspecies comparative genomics studies as well as functional genomics studies aimed at a better understanding of physiological processes and regulatory networks operating in lactococci. This paper describes the initial set-up of a DNA-microarray facility in our group, to enable transcriptome analysis of various Gram-positive bacteria, including a ssp. lactis and a ssp. cremoris strain of Lactococcus lactis. Moreover a global description will be given of the hardware and software requirements for such a set-up, highlighting the crucial integration of relevant bioinformatics tools and methods. This includes the development of MolGenIS, an information system for transcriptome data storage and retrieval, and LactococCye, a metabolic pathway/genome database of Lactococcus lactis.
Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.
Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris
2004-07-14
With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.
The Immunological Genome Project: networks of gene expression in immune cells.
Heng, Tracy S P; Painter, Michio W
2008-10-01
The Immunological Genome Project combines immunology and computational biology laboratories in an effort to establish a complete 'road map' of gene-expression and regulatory networks in all immune cells.
Kujur, Alice; Saxena, Maneesha S; Bajaj, Deepak; Laxmi; Parida, Swarup K
2013-12-01
The enormous population growth, climate change and global warming are now considered major threats to agriculture and world's food security. To improve the productivity and sustainability of agriculture, the development of highyielding and durable abiotic and biotic stress-tolerant cultivars and/climate resilient crops is essential. Henceforth, understanding the molecular mechanism and dissection of complex quantitative yield and stress tolerance traits is the prime objective in current agricultural biotechnology research. In recent years, tremendous progress has been made in plant genomics and molecular breeding research pertaining to conventional and next-generation whole genome, transcriptome and epigenome sequencing efforts, generation of huge genomic, transcriptomic and epigenomic resources and development of modern genomics-assisted breeding approaches in diverse crop genotypes with contrasting yield and abiotic stress tolerance traits. Unfortunately, the detailed molecular mechanism and gene regulatory networks controlling such complex quantitative traits is not yet well understood in crop plants. Therefore, we propose an integrated strategies involving available enormous and diverse traditional and modern -omics (structural, functional, comparative and epigenomics) approaches/resources and genomics-assisted breeding methods which agricultural biotechnologist can adopt/utilize to dissect and decode the molecular and gene regulatory networks involved in the complex quantitative yield and stress tolerance traits in crop plants. This would provide clues and much needed inputs for rapid selection of novel functionally relevant molecular tags regulating such complex traits to expedite traditional and modern marker-assisted genetic enhancement studies in target crop species for developing high-yielding stress-tolerant varieties.
Two-component signal transduction systems of Xanthomonas spp.: a lesson from genomics.
Qian, Wei; Han, Zhong-Ji; He, Chaozu
2008-02-01
The two-component signal transduction systems (TCSTSs), consisting of a histidine kinase sensor (HK) and a response regulator (RR), are the dominant molecular mechanisms by which prokaryotes sense and respond to environmental stimuli. Genomes of Xanthomonas generally contain a large repertoire of TCSTS genes (approximately 92 to 121 for each genome), which encode diverse structural groups of HKs and RRs. Among them, although a core set of 70 TCSTS genes (about two-thirds in total) which accumulates point mutations with a slow rate are shared by these genomes, the other genes, especially hybrid HKs, experienced extensive genetic recombination, including genomic rearrangement, gene duplication, addition or deletion, and fusion or fission. The recombinations potentially promote the efficiency and complexity of TCSTSs in regulating gene expression. In addition, our analysis suggests that a co-evolutionary model, rather than a selfish operon model, is the major mechanism for the maintenance and microevolution of TCSTS genes in the genomes of Xanthomonas. Genomic annotation, secondary protein structure prediction, and comparative genomic analyses of TCSTS genes reviewed here provide insights into our understanding of signal networks in these important phytopathogenic bacteria.
Reduced Synchronization Persistence in Neural Networks Derived from Atm-Deficient Mice
Levine-Small, Noah; Yekutieli, Ziv; Aljadeff, Jonathan; Boccaletti, Stefano; Ben-Jacob, Eshel; Barzilai, Ari
2011-01-01
Many neurodegenerative diseases are characterized by malfunction of the DNA damage response. Therefore, it is important to understand the connection between system level neural network behavior and DNA. Neural networks drawn from genetically engineered animals, interfaced with micro-electrode arrays allowed us to unveil connections between networks’ system level activity properties and such genome instability. We discovered that Atm protein deficiency, which in humans leads to progressive motor impairment, leads to a reduced synchronization persistence compared to wild type synchronization, after chemically imposed DNA damage. Not only do these results suggest a role for DNA stability in neural network activity, they also establish an experimental paradigm for empirically determining the role a gene plays on the behavior of a neural network. PMID:21519382
Diving into marine genomics with CRISPR/Cas9 systems.
Momose, Tsuyoshi; Concordet, Jean-Paul
2016-12-01
More and more genomes are sequenced and a great range of biological questions can be examined at the genomic level in a growing number of organisms. Testing the function of genome features, from gene networks, genome organization, conserved non-coding sequences to microRNAs, and, more generally, experimentally addressing the genotype-phenotype relationship is now possible owing to the clustered, regularly interspaced, short palindromic repeats (CRISPR)-Cas9 revolution of genome editing. In the present review, we give a brief overview of the CRISPR/Cas9 toolbox and different strategies for genome editing currently available. We list the first examples of applications to marine organisms and also draw from studies in more common laboratory models to suggest both guidelines for design of genome editing experiments as well as discuss challenges specific to marine organisms. In addition, we discuss future perspectives, including applications of CRISPR/Cas9 to base editing and targeted reprogramming of gene transcription. Copyright © 2016 Elsevier B.V. All rights reserved.
Analysis of Existing International Policy Evidence in Public Health Genomics: Mapping Exercise
Syurina, Elena V.; in den Bäumen, Tobias Schulte; Feron, Frans J.M.; Brand, Angela
2012-01-01
Background In the last decades we have seen a constant growth in the fields of science related to the use of genome-based health information. However, there is a gap between basic science research and the Public Health everyday practice. For a successful introduction of genome-based technologies policy actions on the international level are needed. This work represents the initial stage of the PHGEN II (Public Health Genomics European Network II) project. In order to prepare a base for bridging genomics and Public Health, an inventory study of the existing legislative base dealing with controversies of genome-based knowledge was conducted. The work results in the mapping of the most and the least legislatively covered areas and some preliminary conclusions about the existing gaps. Design and Methods The collection of the evidence-based policies was done through the PHGEN II project. The mapping covered the meta-level (international, European general guidelines). The expert opinion of the partners of the project was required to reflect on and grade the collected evidence. Results An analysis of the evidence was made by the area of coverage: using the list of important policy areas for successful introduction of genome-based technologies into Public Health and the Public Health Genomics Wheel (originally Public Health Wheel developed by Institute of Medicine). Conclusions Severe inequalities in coverage of important issues of Public Health Genomics were found. The most attention was paid to clinical utility and clinical validity of the screening and the protection of human subjects. Important areas such as trade agreements, Public Health Genomics literacy, insurance issues, behaviour modification in response to genomics results etc. were paid less attention to. For the successful adoption of new technologies on the Public Health level the focus should be not only on the translation to clinical practice, but the translation from bench to Public Health policy and back. Coherent and consistent coverage of all aspects of the translation of genome based information and technologies is of outmost importance. PMID:25170444
Hamilton, Joshua J; Dwivedi, Vivek; Reed, Jennifer L
2013-07-16
Constraint-based methods provide powerful computational techniques to allow understanding and prediction of cellular behavior. These methods rely on physiochemical constraints to eliminate infeasible behaviors from the space of available behaviors. One such constraint is thermodynamic feasibility, the requirement that intracellular flux distributions obey the laws of thermodynamics. The past decade has seen several constraint-based methods that interpret this constraint in different ways, including those that are limited to small networks, rely on predefined reaction directions, and/or neglect the relationship between reaction free energies and metabolite concentrations. In this work, we utilize one such approach, thermodynamics-based metabolic flux analysis (TMFA), to make genome-scale, quantitative predictions about metabolite concentrations and reaction free energies in the absence of prior knowledge of reaction directions, while accounting for uncertainties in thermodynamic estimates. We applied TMFA to a genome-scale network reconstruction of Escherichia coli and examined the effect of thermodynamic constraints on the flux space. We also assessed the predictive performance of TMFA against gene essentiality and quantitative metabolomics data, under both aerobic and anaerobic, and optimal and suboptimal growth conditions. Based on these results, we propose that TMFA is a useful tool for validating phenotypes and generating hypotheses, and that additional types of data and constraints can improve predictions of metabolite concentrations. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Mason, Clifford W; Swaan, Peter W; Weiner, Carl P
2006-06-01
The transition from myometrial quiescence to activation is poorly understood, and the analysis of array data is limited by the available data mining tools. We applied functional analysis and logical operations along regulatory gene networks to identify molecular processes and pathways underlying quiescence and activation. We analyzed some 18,400 transcripts and variants in guinea pig myometrium at stages corresponding to quiescence and activation, and compared them to the nonpregnant (control) counterpart using a functional mapping tool, MetaCore (GeneGo, St Joseph, MI) to identify novel gene networks composed of biological pathways during mid (MP) and late (LP) pregnancy. Genes altered during quiescence and or activation were identified following gene specific comparisons with myometrium from nonpregnant animals, and then linked to curated pathways and formulated networks. The MP and LP networks were subtracted from each other to identify unique genomic events during those periods. For example, changes 2-fold or greater in genes mediating protein biosynthesis, programmed cell death, microtubule polymerization, and microtubule based movement were noted during the transition to LP. We describe a novel approach combining microarrays and genetic data to identify networks associated with normal myometrial events. The resulting insights help identify potential biomarkers and permit future targeted investigations of these pathways or networks to confirm or refute their importance.
Recent Coselection in Human Populations Revealed by Protein–Protein Interaction Network
Qian, Wei; Zhou, Hang; Tang, Kun
2015-01-01
Genome-wide scans for signals of natural selection in human populations have identified a large number of candidate loci that underlie local adaptations. This is surprising given the relatively short evolutionary time since the divergence of the human population. One hypothesis that has not been formally examined is whether and how the recent human evolution may have been shaped by coselection in the context of complex molecular interactome. In this study, genome-wide signals of selection were scanned in East Asians, Europeans, and Africans using 1000 Genome data, and subsequently mapped onto the protein–protein interaction (PPI) network. We found that the candidate genes of recent positive selection localized significantly closer to each other on the PPI network than expected, revealing substantial clustering of selected genes. Furthermore, gene pairs of shorter PPI network distances showed higher similarities of their recent evolutionary paths than those further apart. Last, subnetworks enriched with recent coselection signals were identified, which are substantially overrepresented in biological pathways related to signal transduction, neurogenesis, and immune function. These results provide the first genome-wide evidence for association of recent selection signals with the PPI network, shedding light on the potential mechanisms of recent coselection in the human genome. PMID:25532814
Gene and miRNA expression profiles in autism spectrum disorders.
Ghahramani Seno, Mohammad M; Hu, Pingzhao; Gwadry, Fuad G; Pinto, Dalila; Marshall, Christian R; Casallo, Guillermo; Scherer, Stephen W
2011-03-22
Accumulating data indicate that there is significant genetic heterogeneity underlying the etiology in individuals diagnosed with autism spectrum disorder (ASD). Some rare and highly-penetrant gene variants and copy number variation (CNV) regions including NLGN3, NLGN4, NRXN1, SHANK2, SHANK3, PTCHD1, 1q21.1, maternally-inherited duplication of 15q11-q13, 16p11.2, amongst others, have been identified to be involved in ASD. Genome-wide association studies have identified other apparently low risk loci and in some other cases, ASD arises as a co-morbid phenotype with other medical genetic conditions (e.g. fragile X). The progress studying the genetics of ASD has largely been accomplished using genomic analyses of germline-derived DNA. Here, we used gene and miRNA expression profiling using cell-line derived total RNA to evaluate possible transcripts and networks of molecules involved in ASD. Our analysis identified several novel dysregulated genes and miRNAs in ASD compared with controls, including HEY1, SOX9, miR-486 and miR-181b. All of these are involved in nervous system development and function and some others, for example, are involved in NOTCH signaling networks (e.g. HEY1). Further, we found significant enrichment in molecules associated with neurological disorders such as Rett syndrome and those associated with nervous system development and function including long-term potentiation. Our data will provide a valuable resource for discovery purposes and for comparison to other gene expression-based, genome-wide DNA studies and other functional data. Copyright © 2010 Elsevier B.V. All rights reserved.
Mutturi, Sarma
2017-06-27
Although handful tools are available for constraint-based flux analysis to generate knockout strains, most of these are either based on bilevel-MIP or its modifications. However, metaheuristic approaches that are known for their flexibility and scalability have been less studied. Moreover, in the existing tools, sectioning of search space to find optimal knocks has not been considered. Herein, a novel computational procedure, termed as FOCuS (Flower-pOllination coupled Clonal Selection algorithm), was developed to find the optimal reaction knockouts from a metabolic network to maximize the production of specific metabolites. FOCuS derives its benefits from nature-inspired flower pollination algorithm and artificial immune system-inspired clonal selection algorithm to converge to an optimal solution. To evaluate the performance of FOCuS, reported results obtained from both MIP and other metaheuristic-based tools were compared in selected case studies. The results demonstrated the robustness of FOCuS irrespective of the size of metabolic network and number of knockouts. Moreover, sectioning of search space coupled with pooling of priority reactions based on their contribution to objective function for generating smaller search space significantly reduced the computational time.
Context-specific metabolic networks are consistent with experiments.
Becker, Scott A; Palsson, Bernhard O
2008-05-16
Reconstructions of cellular metabolism are publicly available for a variety of different microorganisms and some mammalian genomes. To date, these reconstructions are "genome-scale" and strive to include all reactions implied by the genome annotation, as well as those with direct experimental evidence. Clearly, many of the reactions in a genome-scale reconstruction will not be active under particular conditions or in a particular cell type. Methods to tailor these comprehensive genome-scale reconstructions into context-specific networks will aid predictive in silico modeling for a particular situation. We present a method called Gene Inactivity Moderated by Metabolism and Expression (GIMME) to achieve this goal. The GIMME algorithm uses quantitative gene expression data and one or more presupposed metabolic objectives to produce the context-specific reconstruction that is most consistent with the available data. Furthermore, the algorithm provides a quantitative inconsistency score indicating how consistent a set of gene expression data is with a particular metabolic objective. We show that this algorithm produces results consistent with biological experiments and intuition for adaptive evolution of bacteria, rational design of metabolic engineering strains, and human skeletal muscle cells. This work represents progress towards producing constraint-based models of metabolism that are specific to the conditions where the expression profiling data is available.
King, Carly J.; Woodward, Josha; Schwartzman, Jacob; Coleman, Daniel J.; Lisac, Robert; Wang, Nicholas J.; Van Hook, Kathryn; Gao, Lina; Urrutia, Joshua; Dane, Mark A.; Heiser, Laura M.; Alumkal, Joshi J.
2017-01-01
Recent work demonstrates that castration-resistant prostate cancer (CRPC) tumors harbor countless genomic aberrations that control many hallmarks of cancer. While some specific mutations in CRPC may be actionable, many others are not. We hypothesized that genomic aberrations in cancer may operate in concert to promote drug resistance and tumor progression, and that organization of these genomic aberrations into therapeutically targetable pathways may improve our ability to treat CRPC. To identify the molecular underpinnings of enzalutamide-resistant CRPC, we performed transcriptional and copy number profiling studies using paired enzalutamide-sensitive and resistant LNCaP prostate cancer cell lines. Gene networks associated with enzalutamide resistance were revealed by performing an integrative genomic analysis with the PAthway Representation and Analysis by Direct Reference on Graphical Models (PARADIGM) tool. Amongst the pathways enriched in the enzalutamide-resistant cells were those associated with MEK, EGFR, RAS, and NFKB. Functional validation studies of 64 genes identified 10 candidate genes whose suppression led to greater effects on cell viability in enzalutamide-resistant cells as compared to sensitive parental cells. Examination of a patient cohort demonstrated that several of our functionally-validated gene hits are deregulated in metastatic CRPC tumor samples, suggesting that they may be clinically relevant therapeutic targets for patients with enzalutamide-resistant CRPC. Altogether, our approach demonstrates the potential of integrative genomic analyses to clarify determinants of drug resistance and rational co-targeting strategies to overcome resistance. PMID:29340039
Caspi, Ron; Altman, Tomer; Dale, Joseph M.; Dreher, Kate; Fulcher, Carol A.; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G.; Zhang, Peifen; Karp, Peter D.
2010-01-01
The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism. PMID:19850718
USDA-ARS?s Scientific Manuscript database
While more than a thousand protein kinases (PK) have been identified in the Arabidopsis thaliana genome, relatively little progress has been made towards identifying their individual client proteins. Herein we describe use of a mass spectrometry-based in vitro phosphorylation strategy, termed Kinase...
Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks
2011-01-01
Background Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. Results A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Conclusions Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced. PMID:21849086
Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks.
Xie, Xueying; Jin, Jing; Mao, Yongyi
2011-08-18
Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced.
Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes
2013-01-01
Background Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. Results In this study, we report genome-scale transcript response diversity of 10 A. thaliana ecotypes originating from different geographical locations to non-freezing cold stress (10°C). To analyze the transcriptional response diversity, we initially compared transcriptome changes in all 10 ecotypes using Arabidopsis NimbleGen ATH6 microarrays. In total 6061 transcripts were significantly cold regulated (p < 0.01) in 10 ecotypes, including 498 transcription factors and 315 transposable elements. The majority of the transcripts (75%) showed ecotype specific expression pattern. By using sequence data available from Arabidopsis thaliana 1001 genome project, we further investigated sequence polymorphisms in the core cold stress regulon genes. Significant numbers of non-synonymous amino acid changes were observed in the coding region of the CBF regulon genes. Considering the limited knowledge about regulatory interactions between transcription factors and their target genes in the model plant A. thaliana, we have adopted a powerful systems genetics approach- Network Component Analysis (NCA) to construct an in-silico transcriptional regulatory network model during response to cold stress. The resulting regulatory network contained 1,275 nodes and 7,720 connections, with 178 transcription factors and 1,331 target genes. Conclusions A. thaliana ecotypes exhibit considerable variation in transcriptome level responses to non-freezing cold stress treatment. Ecotype specific transcripts and related gene ontology (GO) categories were identified to delineate natural variation of cold stress regulated differential gene expression in the model plant A. thaliana. The predicted regulatory network model was able to identify new ecotype specific transcription factors and their regulatory interactions, which might be crucial for their local geographic adaptation to cold temperature. Additionally, since the approach presented here is general, it could be adapted to study networks regulating biological process in any biological systems. PMID:24148294
ERIC Educational Resources Information Center
Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.
2012-01-01
Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples…
Reconstruction of a composite comparative map composed of ten legume genomes.
Lee, Chaeyoung; Yu, Dongwoon; Choi, Hong-Kyu; Kim, Ryan W
2017-01-01
The Fabaceae (legume family) is the third largest and the second of agricultural importance among flowering plant groups. In this study, we report the reconstruction of a composite comparative map composed of ten legume genomes, including seven species from the galegoid clade ( Medicago truncatula , Medicago sativa , Lens culinaris, Pisum sativum , Lotus japonicus , Cicer arietinum , Vicia faba ) and three species from the phaseoloid clade ( Vigna radiata , Phaseolus vulgaris , Glycine max ). To accomplish this comparison, a total of 209 cross-species gene-derived markers were employed. The comparative analysis resulted in a single extensive genetic/genomic network composed of 93 chromosomes or linkage groups, from which 110 synteny blocks and other evolutionary events (e.g., 13 inversions) were identified. This comparative map also allowed us to deduce several large scale evolutionary events, such as chromosome fusion/fission, with which might explain differences in chromosome numbers among compared species or between the two clades. As a result, useful properties of cross-species genic markers were re-verified as an efficient tool for cross-species translation of genomic information, and similar approaches, combined with a high throughput bioinformatic marker design program, should be effective for applying the knowledge of trait-associated genes to other important crop species for breeding purposes. Here, we provide a basic comparative framework for the ten legume species, and expect to be usefully applied towards the crop improvement in legume breeding.
KEGG Bioinformatics Resource for Plant Genomics and Metabolomics.
Kanehisa, Minoru
2016-01-01
In the era of high-throughput biology it is necessary to develop not only elaborate computational methods but also well-curated databases that can be used as reference for data interpretation. KEGG ( http://www.kegg.jp/ ) is such a reference knowledge base with two specific aims. One is to compile knowledge on high-level functions of the cell and the organism in terms of the molecular interaction and reaction networks, which is implemented in KEGG pathway maps, BRITE functional hierarchies, and KEGG modules. The other is to expand knowledge on genes and proteins involved in the molecular networks from experimentally observed organisms to other organisms using the concept of orthologs, which is implemented in the KEGG Orthology (KO) system. Thus, KEGG is a generic resource applicable to all organisms and enables interpretation of high-level functions from genomic and molecular data. Here we first present a brief overview of the entire KEGG resource, and then give an introduction of how to use KEGG in plant genomics and metabolomics research.
NASA Astrophysics Data System (ADS)
Ao, Ping
2011-03-01
There has been a tremendous progress in cancer research. However, it appears the current dominant cancer research framework of regarding cancer as diseases of genome leads impasse. Naturally questions have been asked that whether it is possible to develop alternative frameworks such that they can connect both to mutations and other genetic/genomic effects and to environmental factors. Furthermore, such framework can be made quantitative and with predictions experimentally testable. In this talk, I will present a positive answer to this calling. I will explain on our construction of endogenous network theory based on molecular-cellular agencies as dynamical variable. Such cancer theory explicitly demonstrates a profound connection to many fundamental concepts in physics, as such stochastic non-equilibrium processes, ``energy'' landscape, metastability, etc. It suggests that neneath cancer's daunting complexity may lie a simplicity that gives grounds for hope. The rationales behind such theory, its predictions, and its initial experimental verifications will be presented. Supported by USA NIH and China NSF.
The role of networks and artificial intelligence in nanotechnology design and analysis.
Hudson, D L; Cohen, M E
2004-05-01
Techniques with their origins in artificial intelligence have had a great impact on many areas of biomedicine. Expert-based systems have been used to develop computer-assisted decision aids. Neural networks have been used extensively in disease classification and more recently in many bioinformatics applications including genomics and drug design. Network theory in general has proved useful in modeling all aspects of biomedicine from healthcare organizational structure to biochemical pathways. These methods show promise in applications involving nanotechnology both in the design phase and in interpretation of system functioning.
Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando
2014-01-01
Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143
Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando
2014-12-01
Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.
Network-assisted target identification for haploinsufficiency and homozygous profiling screens
Wang, Sheng
2017-01-01
Chemical genomic screens have recently emerged as a systematic approach to drug discovery on a genome-wide scale. Drug target identification and elucidation of the mechanism of action (MoA) of hits from these noisy high-throughput screens remain difficult. Here, we present GIT (Genetic Interaction Network-Assisted Target Identification), a network analysis method for drug target identification in haploinsufficiency profiling (HIP) and homozygous profiling (HOP) screens. With the drug-induced phenotypic fitness defect of the deletion of a gene, GIT also incorporates the fitness defects of the gene’s neighbors in the genetic interaction network. On three genome-scale yeast chemical genomic screens, GIT substantially outperforms previous scoring methods on target identification on HIP and HOP assays, respectively. Finally, we showed that by combining HIP and HOP assays, GIT further boosts target identification and reveals potential drug’s mechanism of action. PMID:28574983
Li, Qi; Lin, Feibi; Yang, Chen; Wang, Juanping; Lin, Yan; Shen, Mengyuan; Park, Min S.; Li, Tao; Zhao, Jindong
2018-01-01
Cyanobacterial blooms are worldwide issues of societal concern and scientific interest. Lake Taihu and Lake Dianchi, two of the largest lakes in China, have been suffering from annual Microcystis-based blooms over the past two decades. These two eutrophic lakes differ in both nutrient load and environmental parameters, where Microcystis microbiota consisting of different Microcystis morphospecies and associated bacteria (epibionts) have dominated. We conducted a comprehensive metagenomic study that analyzed species diversity, community structure, functional components, metabolic pathways and networks to investigate functional interactions among the members of six Microcystis-epibiont communities in these two lakes. Our integrated metagenomic pipeline consisted of efficient assembly, binning, annotation, and quality assurance methods that ensured high-quality genome reconstruction. This study provides a total of 68 reconstructed genomes including six complete Microcystis genomes and 28 high quality bacterial genomes of epibionts belonging to 14 distinct taxa. This metagenomic dataset constitutes the largest reference genome catalog available for genome-centric studies of the Microcystis microbiome. Epibiont community composition appears to be dynamic rather than fixed, and the functional profiles of communities were related to the environment of origin. This study demonstrates mutualistic interactions between Microcystis and epibionts at genetic and metabolic levels. Metabolic pathway reconstruction provided evidence for functional complementation in nitrogen and sulfur cycles, fatty acid catabolism, vitamin synthesis, and aromatic compound degradation among community members. Thus, bacterial social interactions within Microcystis-epibiont communities not only shape species composition, but also stabilize the communities functional profiles. These interactions appear to play an important role in environmental adaptation of Microcystis colonies. PMID:29731741
Mesquita, Rafael D.; Vionette-Amaral, Raquel J.; Lowenberger, Carl; Rivera-Pomar, Rolando; Monteiro, Fernando A.; Minx, Patrick; Spieth, John; Carvalho, A. Bernardo; Panzera, Francisco; Lawson, Daniel; Torres, André Q.; Ribeiro, Jose M. C.; Sorgine, Marcos H. F.; Waterhouse, Robert M.; Abad-Franch, Fernando; Alves-Bezerra, Michele; Amaral, Laurence R.; Araujo, Helena M.; Aravind, L.; Atella, Georgia C.; Azambuja, Patricia; Berni, Mateus; Bittencourt-Cunha, Paula R.; Braz, Gloria R. C.; Calderón-Fernández, Gustavo; Carareto, Claudia M. A.; Christensen, Mikkel B.; Costa, Igor R.; Costa, Samara G.; Dansa, Marilvia; Daumas-Filho, Carlos R. O.; De-Paula, Iron F.; Dias, Felipe A.; Dimopoulos, George; Emrich, Scott J.; Esponda-Behrens, Natalia; Fampa, Patricia; Fernandez-Medina, Rita D.; da Fonseca, Rodrigo N.; Fontenele, Marcio; Fronick, Catrina; Fulton, Lucinda A.; Gandara, Ana Caroline; Garcia, Eloi S.; Genta, Fernando A.; Giraldo-Calderón, Gloria I.; Gomes, Bruno; Gondim, Katia C.; Granzotto, Adriana; Guarneri, Alessandra A.; Guigó, Roderic; Harry, Myriam; Hughes, Daniel S. T.; Jablonka, Willy; Jacquin-Joly, Emmanuelle; Juárez, M. Patricia; Koerich, Leonardo B.; Lange, Angela B.; Latorre-Estivalis, José Manuel; Lavore, Andrés; Lawrence, Gena G.; Lazoski, Cristiano; Lazzari, Claudio R.; Lopes, Raphael R.; Lorenzo, Marcelo G.; Lugon, Magda D.; Marcet, Paula L.; Mariotti, Marco; Masuda, Hatisaburo; Megy, Karine; Missirlis, Fanis; Mota, Theo; Noriega, Fernando G.; Nouzova, Marcela; Nunes, Rodrigo D.; Oliveira, Raquel L. L.; Oliveira-Silveira, Gilbert; Ons, Sheila; Orchard, Ian; Pagola, Lucia; Paiva-Silva, Gabriela O.; Pascual, Agustina; Pavan, Marcio G.; Pedrini, Nicolás; Peixoto, Alexandre A.; Pereira, Marcos H.; Pike, Andrew; Polycarpo, Carla; Prosdocimi, Francisco; Ribeiro-Rodrigues, Rodrigo; Robertson, Hugh M.; Salerno, Ana Paula; Salmon, Didier; Santesmasses, Didac; Schama, Renata; Seabra-Junior, Eloy S.; Silva-Cardoso, Livia; Silva-Neto, Mario A. C.; Souza-Gomes, Matheus; Sterkel, Marcos; Taracena, Mabel L.; Tojo, Marta; Tu, Zhijian Jake; Tubio, Jose M. C.; Ursic-Bedoya, Raul; Venancio, Thiago M.; Walter-Nuno, Ana Beatriz; Wilson, Derek; Warren, Wesley C.; Wilson, Richard K.; Huebner, Erwin; Dotson, Ellen M.; Oliveira, Pedro L.
2015-01-01
Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods. PMID:26627243
Rebernig, Carolin A.; Weiss-Schneeweiss, Hanna; Blöch, Cordula; Turner, Barbara; Stuessy, Tod F.; Obermayer, Renate; Villaseñor, Jose L.; Schneeweiss, Gerald M.
2014-01-01
Premise of the study Polyploidy plays an important role in race differentiation and eventually speciation. Underlying mechanisms include chromosomal and genomic changes facilitating reproductive isolation and/or stabilization of hybrids. A prerequisite for studying these processes is a sound knowledge on the origin of polyploids. A well-suited group for studying polyploid evolution consists of the three species of Melampodium ser. Leucantha (Asteraceae): M. argophyllum, M. cinereum, and M. leucanthum. Methods The origin of polyploids was inferred using network and tree-based phylogenetic analyses of several plastid and nuclear DNA sequences and of fingerprint data (AFLP). Genome evolution was assessed via genome size measurements, karyotype analysis, and in situ hybridization of ribosomal DNA. Key results Tetraploid cytotypes of the phylogenetically distinct M. cinereum and M. leucanthum had, compared to the diploid cytotypes, doubled genome sizes and no evidence of gross chromosomal rearrangements. Hexaploid M. argophyllum constituted a separate lineage with limited intermixing with the other species, except in analyses from nuclear ITS. Its genome size was lower than expected if M. cinereum and/or M. leucanthum were involved in its origin, and no chromosomal rearrangements were evident. Conclusions Polyploids in M. cinereum and M. leucanthum are of recent autopolyploid origin in line with the lack of significant genomic changes. Hexaploid M. argophyllum also appears to be of autopolyploid origin against the previous hypothesis of an allopolyploid origin involving the other two species, but some gene flow with the other species in early phases of differentiation cannot be excluded. PMID:22645096
Mesquita, Rafael D; Vionette-Amaral, Raquel J; Lowenberger, Carl; Rivera-Pomar, Rolando; Monteiro, Fernando A; Minx, Patrick; Spieth, John; Carvalho, A Bernardo; Panzera, Francisco; Lawson, Daniel; Torres, André Q; Ribeiro, Jose M C; Sorgine, Marcos H F; Waterhouse, Robert M; Montague, Michael J; Abad-Franch, Fernando; Alves-Bezerra, Michele; Amaral, Laurence R; Araujo, Helena M; Araujo, Ricardo N; Aravind, L; Atella, Georgia C; Azambuja, Patricia; Berni, Mateus; Bittencourt-Cunha, Paula R; Braz, Gloria R C; Calderón-Fernández, Gustavo; Carareto, Claudia M A; Christensen, Mikkel B; Costa, Igor R; Costa, Samara G; Dansa, Marilvia; Daumas-Filho, Carlos R O; De-Paula, Iron F; Dias, Felipe A; Dimopoulos, George; Emrich, Scott J; Esponda-Behrens, Natalia; Fampa, Patricia; Fernandez-Medina, Rita D; da Fonseca, Rodrigo N; Fontenele, Marcio; Fronick, Catrina; Fulton, Lucinda A; Gandara, Ana Caroline; Garcia, Eloi S; Genta, Fernando A; Giraldo-Calderón, Gloria I; Gomes, Bruno; Gondim, Katia C; Granzotto, Adriana; Guarneri, Alessandra A; Guigó, Roderic; Harry, Myriam; Hughes, Daniel S T; Jablonka, Willy; Jacquin-Joly, Emmanuelle; Juárez, M Patricia; Koerich, Leonardo B; Lange, Angela B; Latorre-Estivalis, José Manuel; Lavore, Andrés; Lawrence, Gena G; Lazoski, Cristiano; Lazzari, Claudio R; Lopes, Raphael R; Lorenzo, Marcelo G; Lugon, Magda D; Majerowicz, David; Marcet, Paula L; Mariotti, Marco; Masuda, Hatisaburo; Megy, Karine; Melo, Ana C A; Missirlis, Fanis; Mota, Theo; Noriega, Fernando G; Nouzova, Marcela; Nunes, Rodrigo D; Oliveira, Raquel L L; Oliveira-Silveira, Gilbert; Ons, Sheila; Orchard, Ian; Pagola, Lucia; Paiva-Silva, Gabriela O; Pascual, Agustina; Pavan, Marcio G; Pedrini, Nicolás; Peixoto, Alexandre A; Pereira, Marcos H; Pike, Andrew; Polycarpo, Carla; Prosdocimi, Francisco; Ribeiro-Rodrigues, Rodrigo; Robertson, Hugh M; Salerno, Ana Paula; Salmon, Didier; Santesmasses, Didac; Schama, Renata; Seabra-Junior, Eloy S; Silva-Cardoso, Livia; Silva-Neto, Mario A C; Souza-Gomes, Matheus; Sterkel, Marcos; Taracena, Mabel L; Tojo, Marta; Tu, Zhijian Jake; Tubio, Jose M C; Ursic-Bedoya, Raul; Venancio, Thiago M; Walter-Nuno, Ana Beatriz; Wilson, Derek; Warren, Wesley C; Wilson, Richard K; Huebner, Erwin; Dotson, Ellen M; Oliveira, Pedro L
2015-12-01
Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼ 702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.
Ghatak, Sandeep; Blom, Jochen; Das, Samir; Sanjukta, Rajkumari; Puro, Kekungu; Mawlong, Michael; Shakuntala, Ingudam; Sen, Arnab; Goesmann, Alexander; Kumar, Ashok; Ngachan, S V
2016-07-01
Aeromonas species are important pathogens of fishes and aquatic animals capable of infecting humans and other animals via food. Due to the paucity of pan-genomic studies on aeromonads, the present study was undertaken to analyse the pan-genome of three clinically important Aeromonas species (A. hydrophila, A. veronii, A. caviae). Results of pan-genome analysis revealed an open pan-genome for all three species with pan-genome sizes of 9181, 7214 and 6884 genes for A. hydrophila, A. veronii and A. caviae, respectively. Core-genome: pan-genome ratio (RCP) indicated greater genomic diversity for A. hydrophila and interestingly RCP emerged as an effective indicator to gauge genomic diversity which could possibly be extended to other organisms too. Phylogenomic network analysis highlighted the influence of homologous recombination and lateral gene transfer in the evolution of Aeromonas spp. Prediction of virulence factors indicated no significant difference among the three species though analysis of pathogenic potential and acquired antimicrobial resistance genes revealed greater hazards from A. hydrophila. In conclusion, the present study highlighted the usefulness of whole genome analyses to infer evolutionary cues for Aeromonas species which indicated considerable phylogenomic diversity for A. hydrophila and hitherto unknown genomic evidence for pathogenic potential of A. hydrophila compared to A. veronii and A. caviae.
Bogenpohl, James W; Mignogna, Kristin M; Smith, Maren L; Miles, Michael F
2017-01-01
Complex behavioral traits, such as alcohol abuse, are caused by an interplay of genetic and environmental factors, producing deleterious functional adaptations in the central nervous system. The long-term behavioral consequences of such changes are of substantial cost to both the individual and society. Substantial progress has been made in the last two decades in understanding elements of brain mechanisms underlying responses to ethanol in animal models and risk factors for alcohol use disorder (AUD) in humans. However, treatments for AUD remain largely ineffective and few medications for this disease state have been licensed. Genome-wide genetic polymorphism analysis (GWAS) in humans, behavioral genetic studies in animal models and brain gene expression studies produced by microarrays or RNA-seq have the potential to produce nonbiased and novel insight into the underlying neurobiology of AUD. However, the complexity of such information, both statistical and informational, has slowed progress toward identifying new targets for intervention in AUD. This chapter describes one approach for integrating behavioral, genetic, and genomic information across animal model and human studies. The goal of this approach is to identify networks of genes functioning in the brain that are most relevant to the underlying mechanisms of a complex disease such as AUD. We illustrate an example of how genomic studies in animal models can be used to produce robust gene networks that have functional implications, and to integrate such animal model genomic data with human genetic studies such as GWAS for AUD. We describe several useful analysis tools for such studies: ComBAT, WGCNA, and EW_dmGWAS. The end result of this analysis is a ranking of gene networks and identification of their cognate hub genes, which might provide eventual targets for future therapeutic development. Furthermore, this combined approach may also improve our understanding of basic mechanisms underlying gene x environmental interactions affecting brain functioning in health and disease.
Bogenpohl, James W.; Mignogna, Kristin M.; Smith, Maren L.; Miles, Michael F.
2016-01-01
Complex behavioral traits, such as alcohol abuse, are caused by an interplay of genetic and environmental factors, producing deleterious functional adaptations in the central nervous system. The long-term behavioral consequences of such changes are of substantial cost to both the individual and society. Substantial progress has been made in the last two decades in understanding elements of brain mechanisms underlying responses to ethanol in animal models and risk factors for alcohol use disorder (AUD) in humans. However, treatments for AUD remain largely ineffective and few medications for this disease state have been licensed. Genome-wide genetic polymorphism analysis (GWAS) in humans, behavioral genetic studies in animal models and brain gene expression studies produced by microarrays or RNA-seq have the potential to produce non-biased and novel insight into the underlying neurobiology of AUD. However, the complexity of such information, both statistical and informational, has slowed progress toward identifying new targets for intervention in AUD. This chapter describes one approach for integrating behavioral, genetic, and genomic information across animal model and human studies. The goal of this approach is to identify networks of genes functioning in the brain that are most relevant to the underlying mechanisms of a complex disease such as AUD. We illustrate an example of how genomic studies in animal models can be used to produce robust gene networks that have functional implications, and to integrate such animal model genomic data with human genetic studies such as GWAS for AUD. We describe several useful analysis tools for such studies: ComBAT, WGCNA and EW_dmGWAS. The end result of this analysis is a ranking of gene networks and identification of their cognate hub genes, which might provide eventual targets for future therapeutic development. Furthermore, this combined approach may also improve our understanding of basic mechanisms underlying gene x environmental interactions affecting brain functioning in health and disease. PMID:27933543
Cecil, Alexander; Ohlsen, Knut; Menzel, Thomas; François, Patrice; Schrenzel, Jacques; Fischer, Adrien; Dörries, Kirsten; Selle, Martina; Lalk, Michael; Hantzschmann, Julia; Dittrich, Marcus; Liang, Chunguang; Bernhardt, Jörg; Ölschläger, Tobias A; Bringmann, Gerhard; Bruhn, Heike; Unger, Matthias; Ponte-Sucre, Alicia; Lehmann, Leane; Dandekar, Thomas
2015-01-01
Isoquinolines (IQs) are natural substances with an antibiotic potential we aim to optimize. Specifically, IQ-238 is a synthetic analog of the novel-type N,C-coupled naphthylisoquinoline (NIQ) alkaloid ancisheynine. Recently, we developed and tested other IQs such as IQ-143. By utilizing genome-wide gene expression data, metabolic network modelling and Voronoi tessalation based data analysis - as well as cytotoxicity measurements, chemical properties calculations and principal component analysis of the NIQs - we show that IQ-238 has strong antibiotic potential for staphylococci and low cytotoxicity against murine or human cells. Compared to IQ-143, systemic effects are less pronounced. Most enzyme activity changes due to IQ-238 are located in the carbohydrate metabolism. Validation includes metabolite measurements on biological replicates. IQ-238 delineates key properties and a chemical space for a good therapeutic window. The combination of analysis methods allows suggestions for further lead development and yields an in-depth look at staphylococcal adaptation and network changes after antibiosis. Results are compared to eukaryotic host cells. Copyright © 2014 Elsevier GmbH. All rights reserved.
Analyzing and interpreting genome data at the network level with ConsensusPathDB.
Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas
2016-10-01
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.
Identifying gene networks underlying the neurobiology of ethanol and alcoholism.
Wolen, Aaron R; Miles, Michael F
2012-01-01
For complex disorders such as alcoholism, identifying the genes linked to these diseases and their specific roles is difficult. Traditional genetic approaches, such as genetic association studies (including genome-wide association studies) and analyses of quantitative trait loci (QTLs) in both humans and laboratory animals already have helped identify some candidate genes. However, because of technical obstacles, such as the small impact of any individual gene, these approaches only have limited effectiveness in identifying specific genes that contribute to complex diseases. The emerging field of systems biology, which allows for analyses of entire gene networks, may help researchers better elucidate the genetic basis of alcoholism, both in humans and in animal models. Such networks can be identified using approaches such as high-throughput molecular profiling (e.g., through microarray-based gene expression analyses) or strategies referred to as genetical genomics, such as the mapping of expression QTLs (eQTLs). Characterization of gene networks can shed light on the biological pathways underlying complex traits and provide the functional context for identifying those genes that contribute to disease development.
LAILAPS: the plant science search engine.
Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias
2015-01-01
With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry
2007-01-01
Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146
Chen, Tsute; Siddiqui, Huma; Olsen, Ingar
2017-01-01
Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica . All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.
Chen, Tsute; Siddiqui, Huma; Olsen, Ingar
2017-01-01
Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/. PMID:28261563
The Systems Biology Markup Language (SBML) Level 3 Package: Flux Balance Constraints.
Olivier, Brett G; Bergmann, Frank T
2015-09-04
Constraint-based modeling is a well established modelling methodology used to analyze and study biological networks on both a medium and genome scale. Due to their large size, genome scale models are typically analysed using constraint-based optimization techniques. One widely used method is Flux Balance Analysis (FBA) which, for example, requires a modelling description to include: the definition of a stoichiometric matrix, an objective function and bounds on the values that fluxes can obtain at steady state. The Flux Balance Constraints (FBC) Package extends SBML Level 3 and provides a standardized format for the encoding, exchange and annotation of constraint-based models. It includes support for modelling concepts such as objective functions, flux bounds and model component annotation that facilitates reaction balancing. The FBC package establishes a base level for the unambiguous exchange of genome-scale, constraint-based models, that can be built upon by the community to meet future needs (e. g. by extending it to cover dynamic FBC models).
The Systems Biology Markup Language (SBML) Level 3 Package: Flux Balance Constraints.
Olivier, Brett G; Bergmann, Frank T
2015-06-01
Constraint-based modeling is a well established modelling methodology used to analyze and study biological networks on both a medium and genome scale. Due to their large size, genome scale models are typically analysed using constraint-based optimization techniques. One widely used method is Flux Balance Analysis (FBA) which, for example, requires a modelling description to include: the definition of a stoichiometric matrix, an objective function and bounds on the values that fluxes can obtain at steady state. The Flux Balance Constraints (FBC) Package extends SBML Level 3 and provides a standardized format for the encoding, exchange and annotation of constraint-based models. It includes support for modelling concepts such as objective functions, flux bounds and model component annotation that facilitates reaction balancing. The FBC package establishes a base level for the unambiguous exchange of genome-scale, constraint-based models, that can be built upon by the community to meet future needs (e. g. by extending it to cover dynamic FBC models).
Legionella pathogenicity: genome structure, regulatory networks and the host cell response.
Steinert, Michael; Heuner, Klaus; Buchrieser, Carmen; Albert-Weissenberger, Christiane; Glöckner, Gernot
2007-11-01
Legionella spp. the causative agent of Legionnaires' disease is naturally found in fresh water where the bacteria parasitize intracellularly within protozoa. Upon aerosol formation via man-made water systems, Legionella can enter the human lung and cause a severe form of pneumonia. Here we review results from systematic comparative genome analysis of Legionella species with different pathogenic potentials. The complete genomes reveal that horizontal gene transfer has played an important role during the evolution of Legionella and indicate the importance of secretion machineries for the intracellular lifestyle of this pathogen. Moreover, we highlight recent findings on the in vivo transcriptional program of L. pneumophila and the regulatory networks involved in the biphasic life cycle. In order to understand how Legionella effectively subvert host cell functions for its own benefit the transcriptional host cell response upon infection of the model amoeba Dictyostelium discoideum was studied. The use of this model organism made it possible to develop a roadmap of host cell factors which significantly contribute to the uptake of L. pneumophila and the establishment of an ER-associated replicative vacuole.
Draft genome of the red harvester ant Pogonomyrmex barbatus.
Smith, Chris R; Smith, Christopher D; Robertson, Hugh M; Helmkampf, Martin; Zimin, Aleksey; Yandell, Mark; Holt, Carson; Hu, Hao; Abouheif, Ehab; Benton, Richard; Cash, Elizabeth; Croset, Vincent; Currie, Cameron R; Elhaik, Eran; Elsik, Christine G; Favé, Marie-Julie; Fernandes, Vilaiwan; Gibson, Joshua D; Graur, Dan; Gronenberg, Wulfila; Grubbs, Kirk J; Hagen, Darren E; Viniegra, Ana Sofia Ibarraran; Johnson, Brian R; Johnson, Reed M; Khila, Abderrahman; Kim, Jay W; Mathis, Kaitlyn A; Munoz-Torres, Monica C; Murphy, Marguerite C; Mustard, Julie A; Nakamura, Rin; Niehuis, Oliver; Nigam, Surabhi; Overson, Rick P; Placek, Jennifer E; Rajakumar, Rajendhran; Reese, Justin T; Suen, Garret; Tao, Shu; Torres, Candice W; Tsutsui, Neil D; Viljakainen, Lumi; Wolschin, Florian; Gadau, Jürgen
2011-04-05
We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics
Lugli, Gabriele Andrea; Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Ferrario, Chiara; Viappiani, Alice; Mancabelli, Leonardo; Mangifesta, Marta; Taminiau, Bernard; Delcenserie, Véronique; van Sinderen, Douwe
2014-01-01
The Bifidobacterium genus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particular Bifidobacterium taxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of the Bifidobacterium genus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation. PMID:25107967
Glinsky, Gennadi V.
2016-01-01
Abstract Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8–10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements. PMID:27503290
Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter
2014-09-24
Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.
Veiga, Diogo F. T.; Dutta, Bhaskar; Balaźsi, Gábor
2011-01-01
The escalating amount of genome-scale data demands a pragmatic stance from the research community. How can we utilize this deluge of information to better understand biology, cure diseases, or engage cells in bioremediation or biomaterial production for various purposes? A research pipeline moving new sequence, expression and binding data towards practical end goals seems to be necessary. While most individual researchers are not motivated by such well-articulated pragmatic end goals, the scientific community has already self-organized itself to successfully convert genomic data into fundamentally new biological knowledge and practical applications. Here we review two important steps in this workflow: network inference and network response identification, applied to transcriptional regulatory networks. Among network inference methods, we concentrate on relevance networks due to their conceptual simplicity. We classify and discuss network response identification approaches as either data-centric or network-centric. Finally, we conclude with an outlook on what is still missing from these approaches and what may be ahead on the road to biological discovery. PMID:20174676
LIU, YU; PATEL, SANJAY; NIBBE, ROD; MAXWELL, SEAN; CHOWDHURY, SALIM A.; KOYUTURK, MEHMET; ZHU, XIAOFENG; LARKIN, EMMA K.; BUXBAUM, SARAH G; PUNJABI, NARESH M.; GHARIB, SINA A.; REDLINE, SUSAN; CHANCE, MARK R.
2015-01-01
The precise molecular etiology of obstructive sleep apnea (OSA) is unknown; however recent research indicates that several interconnected aberrant pathways and molecular abnormalities are contributors to OSA. Identifying the genes and pathways associated with OSA can help to expand our understanding of the risk factors for the disease as well as provide new avenues for potential treatment. Towards these goals, we have integrated relevant high dimensional data from various sources, such as genome-wide expression data (microarray), protein-protein interaction (PPI) data and results from genome-wide association studies (GWAS) in order to define sub-network elements that connect some of the known pathways related to the disease as well as define novel regulatory modules related to OSA. Two distinct approaches are applied to identify sub-networks significantly associated with OSA. In the first case we used a biased approach based on sixty genes/proteins with known associations with sleep disorders and/or metabolic disease to seed a search using commercial software to discover networks associated with disease followed by information theoretic (mutual information) scoring of the sub-networks. In the second case we used an unbiased approach and generated an interactome constructed from publicly available gene expression profiles and PPI databases, followed by scoring of the network with p-values from GWAS data derived from OSA patients to uncover sub-networks significant for the disease phenotype. A comparison of the approaches reveals a number of proteins that have been previously known to be associated with OSA or sleep. In addition, our results indicate a novel association of Phosphoinositide 3-kinase, the STAT family of proteins and its related pathways with OSA. PMID:21121029
Cheng, Feixiong; Zhao, Junfei; Fooksa, Michaela; Zhao, Zhongming
2016-07-01
Development of computational approaches and tools to effectively integrate multidomain data is urgently needed for the development of newly targeted cancer therapeutics. We proposed an integrative network-based infrastructure to identify new druggable targets and anticancer indications for existing drugs through targeting significantly mutated genes (SMGs) discovered in the human cancer genomes. The underlying assumption is that a drug would have a high potential for anticancer indication if its up-/down-regulated genes from the Connectivity Map tended to be SMGs or their neighbors in the human protein interaction network. We assembled and curated 693 SMGs in 29 cancer types and found 121 proteins currently targeted by known anticancer or noncancer (repurposed) drugs. We found that the approved or experimental cancer drugs could potentially target these SMGs in 33.3% of the mutated cancer samples, and this number increased to 68.0% by drug repositioning through surveying exome-sequencing data in approximately 5000 normal-tumor pairs from The Cancer Genome Atlas. Furthermore, we identified 284 potential new indications connecting 28 cancer types and 48 existing drugs (adjusted P < .05), with a 66.7% success rate validated by literature data. Several existing drugs (e.g., niclosamide, valproic acid, captopril, and resveratrol) were predicted to have potential indications for multiple cancer types. Finally, we used integrative analysis to showcase a potential mechanism-of-action for resveratrol in breast and lung cancer treatment whereby it targets several SMGs (ARNTL, ASPM, CTTN, EIF4G1, FOXP1, and STIP1). In summary, we demonstrated that our integrative network-based infrastructure is a promising strategy to identify potential druggable targets and uncover new indications for existing drugs to speed up molecularly targeted cancer therapeutics. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei
2013-01-01
No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.
Conserved Regulators of Nucleolar Size Revealed by Global Phenotypic Analyses
Neumüller, Ralph A.; Gross, Thomas; Samsonova, Anastasia A.; Vinayagam, Arunachalam; Buckner, Michael; Founk, Karen; Hu, Yanhui; Sharifpoor, Sara; Rosebrock, Adam P.; Andrews, Brenda; Winston, Fred; Perrimon, Norbert
2014-01-01
Regulation of cell growth is a fundamental process in development and disease that integrates a vast array of extra- and intracellular information. A central player in this process is RNA polymerase I (Pol I), which transcribes ribosomal RNA (rRNA) genes in the nucleolus. Rapidly growing cancer cells are characterized by increased Pol I–mediated transcription and, consequently, nucleolar hypertrophy. To map the genetic network underlying the regulation of nucleolar size and of Pol I–mediated transcription, we performed comparative, genome-wide loss-of-function analyses of nucleolar size in Saccharomyces cerevisiae and Drosophila melanogaster coupled with mass spectrometry–based analyses of the ribosomal DNA (rDNA) promoter. With this approach, we identified a set of conserved and nonconserved molecular complexes that control nucleolar size. Furthermore, we characterized a direct role of the histone information regulator (HIR) complex in repressing rRNA transcription in yeast. Our study provides a full-genome, cross-species analysis of a nuclear subcompartment and shows that this approach can identify conserved molecular modules. PMID:23962978
KnowEnG: a knowledge engine for genomics.
Sinha, Saurabh; Song, Jun; Weinshilboum, Richard; Jongeneel, Victor; Han, Jiawei
2015-11-01
We describe here the vision, motivations, and research plans of the National Institutes of Health Center for Excellence in Big Data Computing at the University of Illinois, Urbana-Champaign. The Center is organized around the construction of "Knowledge Engine for Genomics" (KnowEnG), an E-science framework for genomics where biomedical scientists will have access to powerful methods of data mining, network mining, and machine learning to extract knowledge out of genomics data. The scientist will come to KnowEnG with their own data sets in the form of spreadsheets and ask KnowEnG to analyze those data sets in the light of a massive knowledge base of community data sets called the "Knowledge Network" that will be at the heart of the system. The Center is undertaking discovery projects aimed at testing the utility of KnowEnG for transforming big data to knowledge. These projects span a broad range of biological enquiry, from pharmacogenomics (in collaboration with Mayo Clinic) to transcriptomics of human behavior. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Genetic variability of mutans streptococci revealed by wide whole-genome sequencing
2013-01-01
Background Mutans streptococci are a group of bacteria significantly contributing to tooth decay. Their genetic variability is however still not well understood. Results Genomes of 6 clinical S. mutans isolates of different origins, one isolate of S. sobrinus (DSM 20742) and one isolate of S. ratti (DSM 20564) were sequenced and comparatively analyzed. Genome alignment revealed a mosaic-like structure of genome arrangement. Genes related to pathogenicity are found to have high variations among the strains, whereas genes for oxidative stress resistance are well conserved, indicating the importance of this trait in the dental biofilm community. Analysis of genome-scale metabolic networks revealed significant differences in 42 pathways. A striking dissimilarity is the unique presence of two lactate oxidases in S. sobrinus DSM 20742, probably indicating an unusual capability of this strain in producing H2O2 and expanding its ecological niche. In addition, lactate oxidases may form with other enzymes a novel energetic pathway in S. sobrinus DSM 20742 that can remedy its deficiency in citrate utilization pathway. Using 67 S. mutans genomes currently available including the strains sequenced in this study, we estimates the theoretical core genome size of S. mutans, and performed modeling of S. mutans pan-genome by applying different fitting models. An “open” pan-genome was inferred. Conclusions The comparative genome analyses revealed diversities in the mutans streptococci group, especially with respect to the virulence related genes and metabolic pathways. The results are helpful for better understanding the evolution and adaptive mechanisms of these oral pathogen microorganisms and for combating them. PMID:23805886
Inferring causal genomic alterations in breast cancer using gene expression data
2011-01-01
Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811
Song, Hyun-Seob; Goldberg, Noam; Mahajan, Ashutosh; Ramkrishna, Doraiswami
2017-08-01
Elementary (flux) modes (EMs) have served as a valuable tool for investigating structural and functional properties of metabolic networks. Identification of the full set of EMs in genome-scale networks remains challenging due to combinatorial explosion of EMs in complex networks. It is often, however, that only a small subset of relevant EMs needs to be known, for which optimization-based sequential computation is a useful alternative. Most of the currently available methods along this line are based on the iterative use of mixed integer linear programming (MILP), the effectiveness of which significantly deteriorates as the number of iterations builds up. To alleviate the computational burden associated with the MILP implementation, we here present a novel optimization algorithm termed alternate integer linear programming (AILP). Our algorithm was designed to iteratively solve a pair of integer programming (IP) and linear programming (LP) to compute EMs in a sequential manner. In each step, the IP identifies a minimal subset of reactions, the deletion of which disables all previously identified EMs. Thus, a subsequent LP solution subject to this reaction deletion constraint becomes a distinct EM. In cases where no feasible LP solution is available, IP-derived reaction deletion sets represent minimal cut sets (MCSs). Despite the additional computation of MCSs, AILP achieved significant time reduction in computing EMs by orders of magnitude. The proposed AILP algorithm not only offers a computational advantage in the EM analysis of genome-scale networks, but also improves the understanding of the linkage between EMs and MCSs. The software is implemented in Matlab, and is provided as supplementary information . hyunseob.song@pnnl.gov. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
Network reconstruction and systems analysis of plant cell wall deconstruction by Neurospora crassa.
Samal, Areejit; Craig, James P; Coradetti, Samuel T; Benz, J Philipp; Eddy, James A; Price, Nathan D; Glass, N Louise
2017-01-01
Plant biomass degradation by fungal-derived enzymes is rapidly expanding in economic importance as a clean and efficient source for biofuels. The ability to rationally engineer filamentous fungi would facilitate biotechnological applications for degradation of plant cell wall polysaccharides. However, incomplete knowledge of biomolecular networks responsible for plant cell wall deconstruction impedes experimental efforts in this direction. To expand this knowledge base, a detailed network of reactions important for deconstruction of plant cell wall polysaccharides into simple sugars was constructed for the filamentous fungus Neurospora crassa . To reconstruct this network, information was integrated from five heterogeneous data types: functional genomics, transcriptomics, proteomics, genetics, and biochemical characterizations. The combined information was encapsulated into a feature matrix and the evidence weighted to assign annotation confidence scores for each gene within the network. Comparative analyses of RNA-seq and ChIP-seq data shed light on the regulation of the plant cell wall degradation network, leading to a novel hypothesis for degradation of the hemicellulose mannan. The transcription factor CLR-2 was subsequently experimentally shown to play a key role in the mannan degradation pathway of N. crassa . Here we built a network that serves as a scaffold for integration of diverse experimental datasets. This approach led to the elucidation of regulatory design principles for plant cell wall deconstruction by filamentous fungi and a novel function for the transcription factor CLR-2. This expanding network will aid in efforts to rationally engineer industrially relevant hyper-production strains.
Parallel or convergent evolution in human population genomic data revealed by genotype networks.
R Vahdati, Ali; Wagner, Andreas
2016-08-02
Genotype networks are representations of genetic variation data that are complementary to phylogenetic trees. A genotype network is a graph whose nodes are genotypes (DNA sequences) with the same broadly defined phenotype. Two nodes are connected if they differ in some minimal way, e.g., in a single nucleotide. We analyze human genome variation data from the 1,000 genomes project, and construct haploid genotype (haplotype) networks for 12,235 protein coding genes. The structure of these networks varies widely among genes, indicating different patterns of variation despite a shared evolutionary history. We focus on those genes whose genotype networks show many cycles, which can indicate homoplasy, i.e., parallel or convergent evolution, on the sequence level. For 42 genes, the observed number of cycles is so large that it cannot be explained by either chance homoplasy or recombination. When analyzing possible explanations, we discovered evidence for positive selection in 21 of these genes and, in addition, a potential role for constrained variation and purifying selection. Balancing selection plays at most a small role. The 42 genes with excess cycles are enriched in functions related to immunity and response to pathogens. Genotype networks are representations of genetic variation data that can help understand unusual patterns of genomic variation.
Megchelenbrink, Wout; Huynen, Martijn; Marchiori, Elena
2014-01-01
Constraint-based models of metabolic networks are typically underdetermined, because they contain more reactions than metabolites. Therefore the solutions to this system do not consist of unique flux rates for each reaction, but rather a space of possible flux rates. By uniformly sampling this space, an estimated probability distribution for each reaction's flux in the network can be obtained. However, sampling a high dimensional network is time-consuming. Furthermore, the constraints imposed on the network give rise to an irregularly shaped solution space. Therefore more tailored, efficient sampling methods are needed. We propose an efficient sampling algorithm (called optGpSampler), which implements the Artificial Centering Hit-and-Run algorithm in a different manner than the sampling algorithm implemented in the COBRA Toolbox for metabolic network analysis, here called gpSampler. Results of extensive experiments on different genome-scale metabolic networks show that optGpSampler is up to 40 times faster than gpSampler. Application of existing convergence diagnostics on small network reconstructions indicate that optGpSampler converges roughly ten times faster than gpSampler towards similar sampling distributions. For networks of higher dimension (i.e. containing more than 500 reactions), we observed significantly better convergence of optGpSampler and a large deviation between the samples generated by the two algorithms. optGpSampler for Matlab and Python is available for non-commercial use at: http://cs.ru.nl/~wmegchel/optGpSampler/.
Public health and valorization of genome-based technologies: a new model.
Lal, Jonathan A; Schulte In den Bäumen, Tobias; Morré, Servaas A; Brand, Angela
2011-12-05
The success rate of timely translation of genome-based technologies to commercially feasible products/services with applicability in health care systems is significantly low. We identified both industry and scientists neglect health policy aspects when commercializing their technology, more specifically, Public Health Assessment Tools (PHAT) and early on involvement of decision makers through which market authorization and reimbursements are dependent. While Technology Transfer (TT) aims to facilitate translation of ideas into products, Health Technology Assessment, one component of PHAT, for example, facilitates translation of products/processes into healthcare services and eventually comes up with recommendations for decision makers. We aim to propose a new model of valorization to optimize integration of genome-based technologies into the healthcare system. The method used to develop our model is an adapted version of the Fish Trap Model and the Basic Design Cycle. We found although different, similarities exist between TT and PHAT. Realizing the potential of being mutually beneficial justified our proposal of their relative parallel initiation. We observed that the Public Health Genomics Wheel should be included in this relative parallel activity to ensure all societal/policy aspects are dealt with preemptively by both stakeholders. On further analysis, we found out this whole process is dependent on the Value of Information. As a result, we present our LAL (Learning Adapting Leveling) model which proposes, based on market demand; TT and PHAT by consultation/bi-lateral communication should advocate for relevant technologies. This can be achieved by public-private partnerships (PPPs). These widely defined PPPs create the innovation network which is a developing, consultative/collaborative-networking platform between TT and PHAT. This network has iterations and requires learning, assimilating and using knowledge developed and is called absorption capacity. We hypothesize that the higher absorption capacity, higher success possibility. Our model however does not address the phasing out of technology although we believe the same model can be used to simultaneously phase out a technology. This model proposes to facilitate optimization/decrease the timeframe of integration in healthcare. It also helps industry and researchers to come to a strategic decision at an early stage, about technology being developed thus, saving on resources, hence minimizing failures.
Genomic Heterogeneity of Osteosarcoma - Shift from Single Candidates to Functional Modules
Maugg, Doris; Eckstein, Gertrud; Baumhoer, Daniel; Nathrath, Michaela; Korsching, Eberhard
2015-01-01
Osteosarcoma (OS), a bone tumor, exhibit a complex karyotype. On the genomic level a highly variable degree of alterations in nearly all chromosomal regions and between individual tumors is observable. This hampers the identification of common drivers in OS biology. To identify the common molecular mechanisms involved in the maintenance of OS, we follow the hypothesis that all the copy number-associated differences between the patients are intercepted on the level of the functional modules. The implementation is based on a network approach utilizing copy number associated genes in OS, paired expression data and protein interaction data. The resulting functional modules of tightly connected genes were interpreted regarding their biological functions in OS and their potential prognostic significance. We identified an osteosarcoma network assembling well-known and lesser-known candidates. The derived network shows a significant connectivity and modularity suggesting that the genes affected by the heterogeneous genetic alterations share the same biological context. The network modules participate in several critical aspects of cancer biology like DNA damage response, cell growth, and cell motility which is in line with the hypothesis of specifically deregulated but functional modules in cancer. Further, we could deduce genes with possible prognostic significance in OS for further investigation (e.g. EZR, CDKN2A, MAP3K5). Several of those module genes were located on chromosome 6q. The given systems biological approach provides evidence that heterogeneity on the genomic and expression level is ordered by the biological system on the level of the functional modules. Different genomic aberrations are pointing to the same cellular network vicinity to form vital, but already neoplastically altered, functional modules maintaining OS. This observation, exemplarily now shown for OS, has been under discussion already for a longer time, but often in a hypothetical manner, and can here be exemplified for OS. PMID:25848766
Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus
Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna
2016-01-01
Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria. PMID:26900859
Woldesemayat, Adugna Abdi; Van Heusden, Peter; Ndimba, Bongani K; Christoffels, Alan
2017-12-22
Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data. We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the interplay of biochemical reactions that make up the metabolic network, constituting fundamental interface for sorghum defence mechanism against drought stress. This study suggests untapped natural variability in sorghum that could be used for developing drought tolerance. The data presented here, may be regarded as an initial reference point in functional and comparative genomics in the Gramineae family.
Senapedis, William T.; Kennedy, Caleb J.; Boyle, Patrick M.; Silver, Pamela A.
2011-01-01
Forkhead transcription factors (FOXOs) alter a diverse array of cellular processes including the cell cycle, oxidative stress resistance, and aging. Insulin/Akt activation directs phosphorylation and cytoplasmic sequestration of FOXO away from its target genes and serves as an endpoint of a complex signaling network. Using a human genome small interfering RNA (siRNA) library in a cell-based assay, we identified an extensive network of proteins involved in nuclear export, focal adhesion, and mitochondrial respiration not previously implicated in FOXO localization. Furthermore, a detailed examination of mitochondrial factors revealed that loss of uncoupling protein 5 (UCP5) modifies the energy balance and increases free radicals through up-regulation of uncoupling protein 3 (UCP3). The increased superoxide content induces c-Jun N-terminal kinase 1 (JNK1) kinase activity, which in turn affects FOXO localization through a compensatory dephosphorylation of Akt. The resulting nuclear FOXO increases expression of target genes, including mitochondrial superoxide dismutase. By connecting free radical defense and mitochondrial uncoupling to Akt/FOXO signaling, these results have implications in obesity and type 2 diabetes development and the potential for therapeutic intervention. PMID:21460183
Senapedis, William T; Kennedy, Caleb J; Boyle, Patrick M; Silver, Pamela A
2011-05-15
Forkhead transcription factors (FOXOs) alter a diverse array of cellular processes including the cell cycle, oxidative stress resistance, and aging. Insulin/Akt activation directs phosphorylation and cytoplasmic sequestration of FOXO away from its target genes and serves as an endpoint of a complex signaling network. Using a human genome small interfering RNA (siRNA) library in a cell-based assay, we identified an extensive network of proteins involved in nuclear export, focal adhesion, and mitochondrial respiration not previously implicated in FOXO localization. Furthermore, a detailed examination of mitochondrial factors revealed that loss of uncoupling protein 5 (UCP5) modifies the energy balance and increases free radicals through up-regulation of uncoupling protein 3 (UCP3). The increased superoxide content induces c-Jun N-terminal kinase 1 (JNK1) kinase activity, which in turn affects FOXO localization through a compensatory dephosphorylation of Akt. The resulting nuclear FOXO increases expression of target genes, including mitochondrial superoxide dismutase. By connecting free radical defense and mitochondrial uncoupling to Akt/FOXO signaling, these results have implications in obesity and type 2 diabetes development and the potential for therapeutic intervention.
Networking Biology: The Origins of Sequence-Sharing Practices in Genomics.
Stevens, Hallam
2015-10-01
The wide sharing of biological data, especially nucleotide sequences, is now considered to be a key feature of genomics. Historians and sociologists have attempted to account for the rise of this sharing by pointing to precedents in model organism communities and in natural history. This article supplements these approaches by examining the role that electronic networking technologies played in generating the specific forms of sharing that emerged in genomics. The links between early computer users at the Stanford Artificial Intelligence Laboratory in the 1960s, biologists using local computer networks in the 1970s, and GenBank in the 1980s, show how networking technologies carried particular practices of communication, circulation, and data distribution from computing into biology. In particular, networking practices helped to transform sequences themselves into objects that had value as a community resource.
The emerging genomics and systems biology research lead to systems genomics studies.
Yang, Mary Qu; Yoshigoe, Kenji; Yang, William; Tong, Weida; Qin, Xiang; Dunker, A; Chen, Zhongxue; Arbania, Hamid R; Liu, Jun S; Niemierko, Andrzej; Yang, Jack Y
2014-01-01
Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.
Exploring Wound-Healing Genomic Machinery with a Network-Based Approach
Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo
2017-01-01
The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674
Xu, Song; Liu, Renwang; Da, Yurong
2018-06-05
This study compared tumor-related signaling pathways with known compounds to determine potential agents for lung adenocarcinoma (LUAD) treatment. Kyoto Encyclopedia of Genes and Genomes signaling pathway analyses were performed based on LUAD differentially expressed genes from The Cancer Genome Atlas (TCGA) project and genotype-tissue expression controls. These results were compared to various known compounds using the Connectivity Mapping dataset. The clinical significance of the hub genes identified by overlapping pathway enrichment analysis was further investigated using data mining from multiple sources. A drug-pathway network for LUAD was constructed, and molecular docking was carried out. After the integration of 57 LUAD-related pathways and 35 pathways affected by small molecules, five overlapping pathways were revealed. Among these five pathways, the p53 signaling pathway was the most significant, with CCNB1, CCNB2, CDK1, CDKN2A, and CHEK1 being identified as hub genes. The p53 signaling pathway is implicated as a risk factor for LUAD tumorigenesis and survival. A total of 88 molecules significantly inhibiting the five LUAD-related oncogenic pathways were involved in the LUAD drug-pathway network. Daunorubicin, mycophenolic acid, and pyrvinium could potentially target the hub gene CHEK1 directly. Our study highlights the critical pathways that should be targeted in the search for potential LUAD treatments, most importantly, the p53 signaling pathway. Some compounds, such as ciclopirox and AG-028671, may have potential roles for LUAD treatment but require further experimental verification. © 2018 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas
2016-09-19
Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.
Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology
Salazar, Brittany M.; Balczewski, Emily A.; Ung, Choong Yong; Zhu, Shizhen
2016-01-01
Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring “big data” applications in pediatric oncology. Computational strategies derived from big data science–network- and machine learning-based modeling and drug repositioning—hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which “big data” and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases. PMID:28035989
Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology.
Salazar, Brittany M; Balczewski, Emily A; Ung, Choong Yong; Zhu, Shizhen
2016-12-27
Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring "big data" applications in pediatric oncology. Computational strategies derived from big data science-network- and machine learning-based modeling and drug repositioning-hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which "big data" and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.
A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh
2018-04-26
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
Although the MYC oncogene has been implicated in cancer, a systematic assessment of alterations of MYC, related transcription factors, and co-regulatory proteins, forming the proximal MYC network (PMN), across human cancers is lacking. Using computational approaches, we define genomic and proteomic features associated with MYC and the PMN across the 33 cancers of The Cancer Genome Atlas. Pan-cancer, 28% of all samples had at least one of the MYC paralogs amplified.
Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice
2011-05-05
High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
Plechakova, Olga; Tranchant-Dubreuil, Christine; Benedet, Fabrice; Couderc, Marie; Tinaut, Alexandra; Viader, Véronique; De Block, Petra; Hamon, Perla; Campa, Claudine; de Kochko, Alexandre; Hamon, Serge; Poncet, Valérie
2009-01-01
Background In the past few years, functional genomics information has been rapidly accumulating on Rubiaceae species and especially on those belonging to the Coffea genus (coffee trees). An increasing number of expressed sequence tag (EST) data and EST- or genomic-derived microsatellite markers have been generated, together with Conserved Ortholog Set (COS) markers. This considerably facilitates comparative genomics or map-based genetic studies through the common use of orthologous loci across different species. Similar genomic information is available for e.g. tomato or potato, members of the Solanaceae family. Since both Rubiaceae and Solanaceae belong to the Euasterids I (lamiids) integration of information on genetic markers would be possible and lead to more efficient analyses and discovery of key loci involved in important traits such as fruit development, quality, and maturation, or adaptation. Our goal was to develop a comprehensive web data source for integrated information on validated orthologous markers in Rubiaceae. Description MoccaDB is an online MySQL-PHP driven relational database that houses annotated and/or mapped microsatellite markers in Rubiaceae. In its current release, the database stores 638 markers that have been defined on 259 ESTs and 379 genomic sequences. Marker information was retrieved from 11 published works, and completed with original data on 132 microsatellite markers validated in our laboratory. DNA sequences were derived from three Coffea species/hybrids. Microsatellite markers were checked for similarity, in vitro tested for cross-amplification and diversity/polymorphism status in up to 38 Rubiaceae species belonging to the Cinchonoideae and Rubioideae subfamilies. Functional annotation was provided and some markers associated with described metabolic pathways were also integrated. Users can search the database for marker, sequence, map or diversity information through multi-option query forms. The retrieved data can be browsed and downloaded, along with protocols used, using a standard web browser. MoccaDB also integrates bioinformatics tools (CMap viewer and local BLAST) and hyperlinks to related external data sources (NCBI GenBank and PubMed, SOL Genomic Network database). Conclusion We believe that MoccaDB will be extremely useful for all researchers working in the areas of comparative and functional genomics and molecular evolution, in general, and population analysis and association mapping of Rubiaceae and Solanaceae species, in particular. PMID:19788737
What can we learn about lyssavirus genomes using 454 sequencing?
Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin
2012-01-01
The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
TOPSAN: a dynamic web database for structural genomics.
Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John
2011-01-01
The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
Carbone, Alessandra; Madden, Richard
2005-10-01
Codon bias is related to metabolic functions in translationally biased organisms, and two facts are argued about. First, genes with high codon bias describe in meaningful ways the metabolic characteristics of the organism; important metabolic pathways corresponding to crucial characteristics of the lifestyle of an organism, such as photosynthesis, nitrification, anaerobic versus aerobic respiration, sulfate reduction, methanogenesis, and others, happen to involve especially biased genes. Second, gene transcriptional levels of sets of experiments representing a significant variation of biological conditions strikingly confirm, in the case of Saccharomyces cerevisiae, that metabolic preferences are detectable by purely statistical analysis: the high metabolic activity of yeast during fermentation is encoded in the high bias of enzymes involved in the associated pathways, suggesting that this genome was affected by a strong evolutionary pressure that favored a predominantly fermentative metabolism of yeast in the wild. The ensemble of metabolic pathways involving enzymes with high codon bias is rather well defined and remains consistent across many species, even those that have not been considered as translationally biased, such as Helicobacter pylori, for instance, reveal some weak form of translational bias for this genome. We provide numerical evidence, supported by experimental data, of these facts and conclude that the metabolic networks of translationally biased genomes, observable today as projections of eons of evolutionary pressure, can be analyzed numerically and predictions of the role of specific pathways during evolution can be derived. The new concepts of Comparative Pathway Index, used to compare organisms with respect to their metabolic networks, and Evolutionary Pathway Index, used to detect evolutionarily meaningful bias in the genetic code from transcriptional data, are introduced.
Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.
Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P
2010-12-22
Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
Darwinian evolution in the light of genomics
Koonin, Eugene V.
2009-01-01
Comparative genomics and systems biology offer unprecedented opportunities for testing central tenets of evolutionary biology formulated by Darwin in the Origin of Species in 1859 and expanded in the Modern Synthesis 100 years later. Evolutionary-genomic studies show that natural selection is only one of the forces that shape genome evolution and is not quantitatively dominant, whereas non-adaptive processes are much more prominent than previously suspected. Major contributions of horizontal gene transfer and diverse selfish genetic elements to genome evolution undermine the Tree of Life concept. An adequate depiction of evolution requires the more complex concept of a network or ‘forest’ of life. There is no consistent tendency of evolution towards increased genomic complexity, and when complexity increases, this appears to be a non-adaptive consequence of evolution under weak purifying selection rather than an adaptation. Several universals of genome evolution were discovered including the invariant distributions of evolutionary rates among orthologous genes from diverse genomes and of paralogous gene family sizes, and the negative correlation between gene expression level and sequence evolution rate. Simple, non-adaptive models of evolution explain some of these universals, suggesting that a new synthesis of evolutionary biology might become feasible in a not so remote future. PMID:19213802
A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning
2018-01-01
Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968
Centeno, Tonatiuh Pena; Shomroni, Orr; Hennion, Magali; Halder, Rashi; Vidal, Ramon; Rahman, Raza-Ur; Bonn, Stefan
2016-10-11
Recent evidence suggests that the formation and maintenance of memory requires epigenetic changes. In an effort to understand the spatio-temporal extent of learning and memory-related epigenetic changes we have charted genome-wide histone and DNA methylation profiles, in two different brain regions, two cell types, and three time-points, before and after learning. In this data descriptor we provide detailed information on data generation, give insights into the rationale of experiments, highlight necessary steps to assess data quality, offer guidelines for future use of the data and supply ready-to-use code to replicate the analysis results. The data provides a blueprint of the gene regulatory network underlying short- and long-term memory formation and maintenance. This 'healthy' gene regulatory network of learning can now be compared to changes in neurological or psychiatric diseases, providing mechanistic insights into brain disorders and highlighting potential therapeutic avenues.
Naithani, Sushma; Jaiswal, Pankaj
2017-01-01
The species-specific plant Pathway Genome Databases (PGDBs) based on the BioCyc platform provide a conceptual model of the cellular metabolic network of an organism. Such frameworks allow analysis of the genome-scale expression data to understand changes in the overall metabolisms of an organism (or organs, tissues, and cells) in response to various extrinsic (e.g. developmental and differentiation) and/or extrinsic signals (e.g. pathogens and abiotic stresses) from the surrounding environment. Using FragariaCyc, a pathway database for the diploid strawberry Fragaria vesca, we show (1) the basic navigation across a PGDB; (2) a case study of pathway comparison across plant species; and (3) an example of RNA-Seq data analysis using Omics Viewer tool. The protocols described here generally apply to other Pathway Tools-based PGDBs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verhaak, Roel GW; Hoadley, Katherine A; Purdom, Elizabeth
The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefitmore » in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.« less
Exploring of the molecular mechanism of rhinitis via bioinformatics methods
Song, Yufen; Yan, Zhaohui
2018-01-01
The aim of this study was to analyze gene expression profiles for exploring the function and regulatory network of differentially expressed genes (DEGs) in pathogenesis of rhinitis by a bioinformatics method. The gene expression profile of GSE43523 was downloaded from the Gene Expression Omnibus database. The dataset contained 7 seasonal allergic rhinitis samples and 5 non-allergic normal samples. DEGs between rhinitis samples and normal samples were identified via the limma package of R. The webGestal database was used to identify enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the DEGs. The differentially co-expressed pairs of the DEGs were identified via the DCGL package in R, and the differential co-expression network was constructed based on these pairs. A protein-protein interaction (PPI) network of the DEGs was constructed based on the Search Tool for the Retrieval of Interacting Genes database. A total of 263 DEGs were identified in rhinitis samples compared with normal samples, including 125 downregulated ones and 138 upregulated ones. The DEGs were enriched in 7 KEGG pathways. 308 differential co-expression gene pairs were obtained. A differential co-expression network was constructed, containing 212 nodes. In total, 148 PPI pairs of the DEGs were identified, and a PPI network was constructed based on these pairs. Bioinformatics methods could help us identify significant genes and pathways related to the pathogenesis of rhinitis. Steroid biosynthesis pathway and metabolic pathways might play important roles in the development of allergic rhinitis (AR). Genes such as CDC42 effector protein 5, solute carrier family 39 member A11 and PR/SET domain 10 might be also associated with the pathogenesis of AR, which provided references for the molecular mechanisms of AR. PMID:29257233
Cancer genomics: technology, discovery, and translation.
Tran, Ben; Dancey, Janet E; Kamel-Reid, Suzanne; McPherson, John D; Bedard, Philippe L; Brown, Andrew M K; Zhang, Tong; Shaw, Patricia; Onetto, Nicole; Stein, Lincoln; Hudson, Thomas J; Neel, Benjamin G; Siu, Lillian L
2012-02-20
In recent years, the increasing awareness that somatic mutations and other genetic aberrations drive human malignancies has led us within reach of personalized cancer medicine (PCM). The implementation of PCM is based on the following premises: genetic aberrations exist in human malignancies; a subset of these aberrations drive oncogenesis and tumor biology; these aberrations are actionable (defined as having the potential to affect management recommendations based on diagnostic, prognostic, and/or predictive implications); and there are highly specific anticancer agents available that effectively modulate these targets. This article highlights the technology underlying cancer genomics and examines the early results of genome sequencing and the challenges met in the discovery of new genetic aberrations. Finally, drawing from experiences gained in a feasibility study of somatic mutation genotyping and targeted exome sequencing led by Princess Margaret Hospital-University Health Network and the Ontario Institute for Cancer Research, the processes, challenges, and issues involved in the translation of cancer genomics to the clinic are discussed.
Castrillo, Juan I; Lista, Simone; Hampel, Harald; Ritchie, Craig W
2018-01-01
Alzheimer's disease (AD) is a complex multifactorial disease, involving a combination of genomic, interactome, and environmental factors, with essential participation of (a) intrinsic genomic susceptibility and (b) a constant dynamic interplay between impaired pathways and central homeostatic networks of nerve cells. The proper investigation of the complexity of AD requires new holistic systems-level approaches, at both the experimental and computational level. Systems biology methods offer the potential to unveil new fundamental insights, basic mechanisms, and networks and their interplay. These may lead to the characterization of mechanism-based molecular signatures, and AD hallmarks at the earliest molecular and cellular levels (and beyond), for characterization of AD subtypes and stages, toward targeted interventions according to the evolving precision medicine paradigm. In this work, an update on advanced systems biology methods and strategies for holistic studies of multifactorial diseases-particularly AD-is presented. This includes next-generation genomics, neuroimaging and multi-omics methods, experimental and computational approaches, relevant disease models, and latest genome editing and single-cell technologies. Their progressive incorporation into basic research, cohort studies, and trials is beginning to provide novel insights into AD essential mechanisms, molecular signatures, and markers toward mechanism-based classification and staging, and tailored interventions. Selected methods which can be applied in cohort studies and trials, with the European Prevention of Alzheimer's Dementia (EPAD) project as a reference example, are presented and discussed.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...
2016-11-24
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Xie, Zhengwei; Zhang, Tianyu; Ouyang, Qi
2018-02-01
One of the long-expected goals of genome-scale metabolic modelling is to evaluate the influence of the perturbed enzymes on flux distribution. Both ordinary differential equation (ODE) models and constraint-based models, like Flux balance analysis (FBA), lack the capacity to perform metabolic control analysis (MCA) for large-scale networks. In this study, we developed a hyper-cube shrink algorithm (HCSA) to incorporate the enzymatic properties into the FBA model by introducing a pseudo reaction V constrained by enzymatic parameters. Our algorithm uses the enzymatic information quantitatively rather than qualitatively. We first demonstrate the concept by applying HCSA to a simple three-node network, whereby we obtained a good correlation between flux and enzyme abundance. We then validate its prediction by comparison with ODE and with a synthetic network producing voilacein and analogues in Saccharomyces cerevisiae. We show that HCSA can mimic the state-state results of ODE. Finally, we show its capability of predicting the flux distribution in genome-scale networks by applying it to sporulation in yeast. We show the ability of HCSA to operate without biomass flux and perform MCA to determine rate-limiting reactions. Algorithm was implemented by Matlab and C ++. The code is available at https://github.com/kekegg/HCSA. xiezhengwei@hsc.pku.edu.cn or qi@pku.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Network Thermodynamic Curation of Human and Yeast Genome-Scale Metabolic Models
Martínez, Verónica S.; Quek, Lake-Ee; Nielsen, Lars K.
2014-01-01
Genome-scale models are used for an ever-widening range of applications. Although there has been much focus on specifying the stoichiometric matrix, the predictive power of genome-scale models equally depends on reaction directions. Two-thirds of reactions in the two eukaryotic reconstructions Homo sapiens Recon 1 and Yeast 5 are specified as irreversible. However, these specifications are mainly based on biochemical textbooks or on their similarity to other organisms and are rarely underpinned by detailed thermodynamic analysis. In this study, a to our knowledge new workflow combining network-embedded thermodynamic and flux variability analysis was used to evaluate existing irreversibility constraints in Recon 1 and Yeast 5 and to identify new ones. A total of 27 and 16 new irreversible reactions were identified in Recon 1 and Yeast 5, respectively, whereas only four reactions were found with directions incorrectly specified against thermodynamics (three in Yeast 5 and one in Recon 1). The workflow further identified for both models several isolated internal loops that require further curation. The framework also highlighted the need for substrate channeling (in human) and ATP hydrolysis (in yeast) for the essential reaction catalyzed by phosphoribosylaminoimidazole carboxylase in purine metabolism. Finally, the framework highlighted differences in proline metabolism between yeast (cytosolic anabolism and mitochondrial catabolism) and humans (exclusively mitochondrial metabolism). We conclude that network-embedded thermodynamics facilitates the specification and validation of irreversibility constraints in compartmentalized metabolic models, at the same time providing further insight into network properties. PMID:25028891
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks
Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; ...
2015-02-14
Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstratemore » that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited.« less
Beauregard-Racine, Julie; Bicep, Cédric; Schliep, Klaus; Lopez, Philippe; Lapointe, François-Joseph; Bapteste, Eric
2011-07-20
We introduce several forest-based and network-based methods for exploring microbial evolution, and apply them to the study of thousands of genes from 30 strains of E. coli. This case study illustrates how additional analyses could offer fast heuristic alternatives to standard tree of life (TOL) approaches. We use gene networks to identify genes with atypical modes of evolution, and genome networks to characterize the evolution of genetic partnerships between E. coli and mobile genetic elements. We develop a novel polychromatic quartet method to capture patterns of recombination within E. coli, to update the clanistic toolkit, and to search for the impact of lateral gene transfer and of pathogenicity on gene evolution in two large forests of trees bearing E. coli. We unravel high rates of lateral gene transfer involving E. coli (about 40% of the trees under study), and show that both core genes and shell genes of E. coli are affected by non-tree-like evolutionary processes. We show that pathogenic lifestyle impacted the structure of 30% of the gene trees, and that pathogenic strains are more likely to transfer genes with one another than with non-pathogenic strains. In addition, we propose five groups of genes as candidate mobile modules of pathogenicity. We also present strong evidence for recent lateral gene transfer between E. coli and mobile genetic elements. Depending on which evolutionary questions biologists want to address (i.e. the identification of modules, genetic partnerships, recombination, lateral gene transfer, or genes with atypical evolutionary modes, etc.), forest-based and network-based methods are preferable to the reconstruction of a single tree, because they provide insights and produce hypotheses about the dynamics of genome evolution, rather than the relative branching order of species and lineages. Such a methodological pluralism - the use of woods and webs - is to be encouraged to analyse the evolutionary processes at play in microbial evolution.This manuscript was reviewed by: Ford Doolittle, Tal Pupko, Richard Burian, James McInerney, Didier Raoult, and Yan Boucher.
Ataman, Meric
2017-01-01
Genome-scale metabolic reconstructions have proven to be valuable resources in enhancing our understanding of metabolic networks as they encapsulate all known metabolic capabilities of the organisms from genes to proteins to their functions. However the complexity of these large metabolic networks often hinders their utility in various practical applications. Although reduced models are commonly used for modeling and in integrating experimental data, they are often inconsistent across different studies and laboratories due to different criteria and detail, which can compromise transferability of the findings and also integration of experimental data from different groups. In this study, we have developed a systematic semi-automatic approach to reduce genome-scale models into core models in a consistent and logical manner focusing on the central metabolism or subsystems of interest. The method minimizes the loss of information using an approach that combines graph-based search and optimization methods. The resulting core models are shown to be able to capture key properties of the genome-scale models and preserve consistency in terms of biomass and by-product yields, flux and concentration variability and gene essentiality. The development of these “consistently-reduced” models will help to clarify and facilitate integration of different experimental data to draw new understanding that can be directly extendable to genome-scale models. PMID:28727725
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.
Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A
2013-02-01
Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
Preciat Gonzalez, German A.; El Assal, Lemmer R. P.; Noronha, Alberto; ...
2017-06-14
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, manymore » algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Preciat Gonzalez, German A.; El Assal, Lemmer R. P.; Noronha, Alberto
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, manymore » algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.« less
Preciat Gonzalez, German A; El Assal, Lemmer R P; Noronha, Alberto; Thiele, Ines; Haraldsdóttir, Hulda S; Fleming, Ronan M T
2017-06-14
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, many algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.
Systems biology approach in plant abiotic stresses.
Mohanta, Tapan Kumar; Bashir, Tufail; Hashem, Abeer; Abd Allah, Elsayed Fathi
2017-12-01
Plant abiotic stresses are the major constraint on plant growth and development, causing enormous crop losses across the world. Plants have unique features to defend themselves against these challenging adverse stress conditions. They modulate their phenotypes upon changes in physiological, biochemical, molecular and genetic information, thus making them tolerant against abiotic stresses. It is of paramount importance to determine the stress-tolerant traits of a diverse range of genotypes of plant species and integrate those traits for crop improvement. Stress-tolerant traits can be identified by conducting genome-wide analysis of stress-tolerant genotypes through the highly advanced structural and functional genomics approach. Specifically, whole-genome sequencing, development of molecular markers, genome-wide association studies and comparative analysis of interaction networks between tolerant and susceptible crop varieties grown under stress conditions can greatly facilitate discovery of novel agronomic traits that protect plants against abiotic stresses. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Nurses collaborating with cross disciplinary networks: starting to integrate genomics into practice.
Adegbola, Maxine
2010-07-01
Nurses and other health-care providers are poised to include genetic discoveries into practice settings and to translate such knowledge for consumer benefit within culturally appropriate contexts. Nurses must seek collaboration with multi-disciplinary networks both locally and internationally. They must also capitalize on the expertise of other seasoned researchers in order to gain national and international exposure, recognition, and funding. Scholarly tailgating is using network relationships to achieve one's professional goals, and capitalizing on expert knowledge from seasoned researchers, educators, and practitioners from diverse international groups. By using scholarly tailgating principles, nurses can become important agents of change for multi-disciplinary networks, and thereby assist in decreasing health disparities. The purpose of this document is to encourage and inspire nurses to seek collaborative multi-disciplinary networks to enable genomic integration into health-care practice and education. Strategies for integrating genomics into practice settings are discussed.
Identification of hub subnetwork based on topological features of genes in breast cancer
ZHUANG, DA-YONG; JIANG, LI; HE, QING-QING; ZHOU, PENG; YUE, TAO
2015-01-01
The aim of this study was to provide functional insight into the identification of hub subnetworks by aggregating the behavior of genes connected in a protein-protein interaction (PPI) network. We applied a protein network-based approach to identify subnetworks which may provide new insight into the functions of pathways involved in breast cancer rather than individual genes. Five groups of breast cancer data were downloaded and analyzed from the Gene Expression Omnibus (GEO) database of high-throughput gene expression data to identify gene signatures using the genome-wide global significance (GWGS) method. A PPI network was constructed using Cytoscape and clusters that focused on highly connected nodes were obtained using the molecular complex detection (MCODE) clustering algorithm. Pathway analysis was performed to assess the functional relevance of selected gene signatures based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Topological centrality was used to characterize the biological importance of gene signatures, pathways and clusters. The results revealed that, cluster1, as well as the cell cycle and oocyte meiosis pathways were significant subnetworks in the analysis of degree and other centralities, in which hub nodes mostly distributed. The most important hub nodes, with top ranked centrality, were also similar with the common genes from the above three subnetwork intersections, which was viewed as a hub subnetwork with more reproducible than individual critical genes selected without network information. This hub subnetwork attributed to the same biological process which was essential in the function of cell growth and death. This increased the accuracy of identifying gene interactions that took place within the same functional process and was potentially useful for the development of biomarkers and networks for breast cancer. PMID:25573623
A new strategy for genome assembly using short sequence reads and reduced representation libraries.
Young, Andrew L; Abaan, Hatice Ozel; Zerbino, Daniel; Mullikin, James C; Birney, Ewan; Margulies, Elliott H
2010-02-01
We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.
Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah; Shin, Hyunjung; Park, Yu Rang; Ritchie, Marylyn D; Kim, Ju Han
2015-01-01
Objective Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Methods Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Results Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Conclusions Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies. PMID:25002459
Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah; Shin, Hyunjung; Park, Yu Rang; Ritchie, Marylyn D; Kim, Ju Han
2015-01-01
Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.
A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers.
Şenbabaoğlu, Yasin; Sümer, Selçuk Onur; Sánchez-Vega, Francisco; Bemis, Debra; Ciriello, Giovanni; Schultz, Nikolaus; Sander, Chris
2016-02-01
Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as 'cancer hallmarks'. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody-related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer.
Genome-Scale Screening of Drug-Target Associations Relevant to Ki Using a Chemogenomics Approach
Cao, Dong-Sheng; Liang, Yi-Zeng; Deng, Zhe; Hu, Qian-Nan; He, Min; Xu, Qing-Song; Zhou, Guang-Hua; Zhang, Liu-Xia; Deng, Zi-xin; Liu, Shao
2013-01-01
The identification of interactions between drugs and target proteins plays a key role in genomic drug discovery. In the present study, the quantitative binding affinities of drug-target pairs are differentiated as a measurement to define whether a drug interacts with a protein or not, and then a chemogenomics framework using an unbiased set of general integrated features and random forest (RF) is employed to construct a predictive model which can accurately classify drug-target pairs. The predictability of the model is further investigated and validated by several independent validation sets. The built model is used to predict drug-target associations, some of which were confirmed by comparing experimental data from public biological resources. A drug-target interaction network with high confidence drug-target pairs was also reconstructed. This network provides further insight for the action of drugs and targets. Finally, a web-based server called PreDPI-Ki was developed to predict drug-target interactions for drug discovery. In addition to providing a high-confidence list of drug-target associations for subsequent experimental investigation guidance, these results also contribute to the understanding of drug-target interactions. We can also see that quantitative information of drug-target associations could greatly promote the development of more accurate models. The PreDPI-Ki server is freely available via: http://sdd.whu.edu.cn/dpiki. PMID:23577055
Orlando, Lori A.; Sperber, Nina R.; Voils, Corrine; Nichols, Marshall; Myers, Rachel A.; Wu, R. Ryanne; Rakhra-Burris, Tejinder; Levy, Kenneth D.; Levy, Mia; Pollin, Toni I.; Guan, Yue; Horowitz, Carol R.; Ramos, Michelle; Kimmel, Stephen E.; McDonough, Caitrin W.; Madden, Ebony B.; Damschroder, Laura J.
2017-01-01
Purpose Implementation research provides a structure for evaluating the clinical integration of genomic medicine interventions. This paper describes the Implementing GeNomics In PracTicE (IGNITE) Network’s efforts to promote: 1) a broader understanding of genomic medicine implementation research; and 2) the sharing of knowledge generated in the network. Methods To facilitate this goal the IGNITE Network Common Measures Working Group (CMG) members adopted the Consolidated Framework for Implementation Research (CFIR) to guide their approach to: identifying constructs and measures relevant to evaluating genomic medicine as a whole, standardizing data collection across projects, and combining data in a centralized resource for cross network analyses. Results CMG identified ten high-priority CFIR constructs as important for genomic medicine. Of those, eight didn’t have standardized measurement instruments. Therefore, we developed four survey tools to address this gap. In addition, we identified seven high-priority constructs related to patients, families, and communities that did not map to CFIR constructs. Both sets of constructs were combined to create a draft genomic medicine implementation model. Conclusion We developed processes to identify constructs deemed valuable for genomic medicine implementation and codified them in a model. These resources are freely available to facilitate knowledge generation and sharing across the field. PMID:28914267
Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra
2017-07-01
This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
McCarty, Catherine A; Chisholm, Rex L; Chute, Christopher G; Kullo, Iftikhar J; Jarvik, Gail P; Larson, Eric B; Li, Rongling; Masys, Daniel R; Ritchie, Marylyn D; Roden, Dan M; Struewing, Jeffery P; Wolf, Wendy A
2011-01-26
The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors. The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel. The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈ 2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site. Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care. By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
Catic, Aida; Gurbeta, Lejla; Kurtovic-Kozaric, Amina; Mehmedbasic, Senad; Badnjevic, Almir
2018-02-13
The usage of Artificial Neural Networks (ANNs) for genome-enabled classifications and establishing genome-phenotype correlations have been investigated more extensively over the past few years. The reason for this is that ANNs are good approximates of complex functions, so classification can be performed without the need for explicitly defined input-output model. This engineering tool can be applied for optimization of existing methods for disease/syndrome classification. Cytogenetic and molecular analyses are the most frequent tests used in prenatal diagnostic for the early detection of Turner, Klinefelter, Patau, Edwards and Down syndrome. These procedures can be lengthy, repetitive; and often employ invasive techniques so a robust automated method for classifying and reporting prenatal diagnostics would greatly help the clinicians with their routine work. The database consisted of data collected from 2500 pregnant woman that came to the Institute of Gynecology, Infertility and Perinatology "Mehmedbasic" for routine antenatal care between January 2000 and December 2016. During first trimester all women were subject to screening test where values of maternal serum pregnancy-associated plasma protein A (PAPP-A) and free beta human chorionic gonadotropin (β-hCG) were measured. Also, fetal nuchal translucency thickness and the presence or absence of the nasal bone was observed using ultrasound. The architectures of linear feedforward and feedback neural networks were investigated for various training data distributions and number of neurons in hidden layer. Feedback neural network architecture out performed feedforward neural network architecture in predictive ability for all five aneuploidy prenatal syndrome classes. Feedforward neural network with 15 neurons in hidden layer achieved classification sensitivity of 92.00%. Classification sensitivity of feedback (Elman's) neural network was 99.00%. Average accuracy of feedforward neural network was 89.6% and for feedback was 98.8%. The results presented in this paper prove that an expert diagnostic system based on neural networks can be efficiently used for classification of five aneuploidy syndromes, covered with this study, based on first trimester maternal serum screening data, ultrasonographic findings and patient demographics. Developed Expert System proved to be simple, robust, and powerful in properly classifying prenatal aneuploidy syndromes.
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roux, Simon; Hallam, Steven J.; Woyke, Tanja
The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 newmore » viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.« less
Viral dark matter and virus-host interactions resolved from publicly available microbial genomes.
Roux, Simon; Hallam, Steven J; Woyke, Tanja; Sullivan, Matthew B
2015-07-22
The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus-host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7-38% of 'unknown' sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus-host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.