Gerlt, John A
2017-08-22
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
2017-01-01
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221
DCODE.ORG Anthology of Comparative Genomic Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loots, G G; Ovcharenko, I
2005-01-11
Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a toolmore » for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.« less
Dcode.org anthology of comparative genomic tools.
Loots, Gabriela G; Ovcharenko, Ivan
2005-07-01
Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.
Genomics Portals: integrative web-platform for mining genomics data.
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario
2010-01-13
A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Genomics Portals: integrative web-platform for mining genomics data
2010-01-01
Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909
BEACON: automated tool for Bacterial GEnome Annotation ComparisON.
Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B
2015-08-18
Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton
2015-01-01
Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
Comparative genome analysis in the integrated microbial genomes (IMG) system.
Markowitz, Victor M; Kyrpides, Nikos C
2007-01-01
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Enabling functional genomics with genome engineering
Hilton, Isaac B.; Gersbach, Charles A.
2015-01-01
Advances in genome engineering technologies have made the precise control over genome sequence and regulation possible across a variety of disciplines. These tools can expand our understanding of fundamental biological processes and create new opportunities for therapeutic designs. The rapid evolution of these methods has also catalyzed a new era of genomics that includes multiple approaches to functionally characterize and manipulate the regulation of genomic information. Here, we review the recent advances of the most widely adopted genome engineering platforms and their application to functional genomics. This includes engineered zinc finger proteins, TALEs/TALENs, and the CRISPR/Cas9 system as nucleases for genome editing, transcription factors for epigenome editing, and other emerging applications. We also present current and potential future applications of these tools, as well as their current limitations and areas for future advances. PMID:26430154
RELATIONSHIP BETWEEN PHYLOGENETIC DISTRIBUTION AND GENOMIC FEATURES IN NEUROSPORA CRASSA
USDA-ARS?s Scientific Manuscript database
In the post-genome era, insufficient functional annotation of predicted genes greatly restricts the potential of mining genome data. We demonstrate that an evolutionary approach, which is independent of functional annotation, has great potential as a tool for genome analysis. We chose the genome o...
Enabling functional genomics with genome engineering.
Hilton, Isaac B; Gersbach, Charles A
2015-10-01
Advances in genome engineering technologies have made the precise control over genome sequence and regulation possible across a variety of disciplines. These tools can expand our understanding of fundamental biological processes and create new opportunities for therapeutic designs. The rapid evolution of these methods has also catalyzed a new era of genomics that includes multiple approaches to functionally characterize and manipulate the regulation of genomic information. Here, we review the recent advances of the most widely adopted genome engineering platforms and their application to functional genomics. This includes engineered zinc finger proteins, TALEs/TALENs, and the CRISPR/Cas9 system as nucleases for genome editing, transcription factors for epigenome editing, and other emerging applications. We also present current and potential future applications of these tools, as well as their current limitations and areas for future advances. © 2015 Hilton and Gersbach; Published by Cold Spring Harbor Laboratory Press.
Using the Saccharomyces Genome Database (SGD) for analysis of genomic information
Skrzypek, Marek S.; Hirschman, Jodi
2011-01-01
Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large-scale datasets. PMID:21901739
Rice functional genomics research in China.
Han, Bin; Xue, Yongbiao; Li, Jiayang; Deng, Xing-Wang; Zhang, Qifa
2007-06-29
Rice functional genomics is a scientific approach that seeks to identify and define the function of rice genes, and uncover when and how genes work together to produce phenotypic traits. Rapid progress in rice genome sequencing has facilitated research in rice functional genomics in China. The Ministry of Science and Technology of China has funded two major rice functional genomics research programmes for building up the infrastructures of the functional genomics study such as developing rice functional genomics tools and resources. The programmes were also aimed at cloning and functional analyses of a number of genes controlling important agronomic traits from rice. National and international collaborations on rice functional genomics study are accelerating rice gene discovery and application.
USDA-ARS?s Scientific Manuscript database
Increasing availability of genomic data and sophistication of analytical methodology in fungi has elevated the need for functional genomics tools in these organisms. Gene deletion is a critical tool for functional analysis. The targeted deletion of genes requires both a suitable method for the trans...
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes
Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J
2008-01-01
Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Functional genomics (FG) screens, using RNAi or CRISPR technology, have become a standard tool for systematic, genome-wide loss-of-function studies for therapeutic target discovery. As in many large-scale assays, however, off-target effects, variable reagents' potency and experimental noise must be accounted for appropriately control for false positives.
Pre-genomic, genomic and post-genomic study of microbial communities involved in bioenergy.
Rittmann, Bruce E; Krajmalnik-Brown, Rosa; Halden, Rolf U
2008-08-01
Microorganisms can produce renewable energy in large quantities and without damaging the environment or disrupting food supply. The microbial communities must be robust and self-stabilizing, and their essential syntrophies must be managed. Pre-genomic, genomic and post-genomic tools can provide crucial information about the structure and function of these microbial communities. Applying these tools will help accelerate the rate at which microbial bioenergy processes move from intriguing science to real-world practice.
Cytoscape: the network visualization tool for GenomeSpace workflows.
Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P
2014-01-01
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013.
Cytoscape: the network visualization tool for GenomeSpace workflows
Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P.
2014-01-01
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537
From data to function: functional modeling of poultry genomics data.
McCarthy, F M; Lyons, E
2013-09-01
One of the challenges of functional genomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and analyzing these data. This process will support the efforts of poultry researchers to make sense of their functional genomics data sets, and we provide here a starting point for researchers who wish to take advantage of these tools.
CuGene as a tool to view and explore genomic data
NASA Astrophysics Data System (ADS)
Haponiuk, Michał; Pawełkowicz, Magdalena; Przybecki, Zbigniew; Nowak, Robert M.
2017-08-01
Integrated CuGene is an easy-to-use, open-source, on-line tool that can be used to browse, analyze, and query genomic data and annotations. It places annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. It also allows users to upload and display their own experimental results or annotation sets. An important functionality of the application is a possibility to find similarity between sequences by applying four different algorithms of different accuracy. The presented tool was tested on real genomic data and is extensively used by Polish Consortium of Cucumber Genome Sequencing.
AgBase: supporting functional modeling in agricultural organisms
McCarthy, Fiona M.; Gresham, Cathy R.; Buza, Teresia J.; Chouvarine, Philippe; Pillai, Lakshmi R.; Kumar, Ranjit; Ozkan, Seval; Wang, Hui; Manda, Prashanti; Arick, Tony; Bridges, Susan M.; Burgess, Shane C.
2011-01-01
AgBase (http://www.agbase.msstate.edu/) provides resources to facilitate modeling of functional genomics data and structural and functional annotation of agriculturally important animal, plant, microbe and parasite genomes. The website is redesigned to improve accessibility and ease of use, including improved search capabilities. Expanded capabilities include new dedicated pages for horse, cat, dog, cotton, rice and soybean. We currently provide 590 240 Gene Ontology (GO) annotations to 105 454 gene products in 64 different species, including GO annotations linked to transcripts represented on agricultural microarrays. For many of these arrays, this provides the only functional annotation available. GO annotations are available for download and we provide comprehensive, species-specific GO annotation files for 18 different organisms. The tools available at AgBase have been expanded and several existing tools improved based upon user feedback. One of seven new tools available at AgBase, GOModeler, supports hypothesis testing from functional genomics data. We host several associated databases and provide genome browsers for three agricultural pathogens. Moreover, we provide comprehensive training resources (including worked examples and tutorials) via links to Educational Resources at the AgBase website. PMID:21075795
GREAT: a web portal for Genome Regulatory Architecture Tools
Bouyioukos, Costas; Bucchini, François; Elati, Mohamed; Képès, François
2016-01-01
GREAT (Genome REgulatory Architecture Tools) is a novel web portal for tools designed to generate user-friendly and biologically useful analysis of genome architecture and regulation. The online tools of GREAT are freely accessible and compatible with essentially any operating system which runs a modern browser. GREAT is based on the analysis of genome layout -defined as the respective positioning of co-functional genes- and its relation with chromosome architecture and gene expression. GREAT tools allow users to systematically detect regular patterns along co-functional genomic features in an automatic way consisting of three individual steps and respective interactive visualizations. In addition to the complete analysis of regularities, GREAT tools enable the use of periodicity and position information for improving the prediction of transcription factor binding sites using a multi-view machine learning approach. The outcome of this integrative approach features a multivariate analysis of the interplay between the location of a gene and its regulatory sequence. GREAT results are plotted in web interactive graphs and are available for download either as individual plots, self-contained interactive pages or as machine readable tables for downstream analysis. The GREAT portal can be reached at the following URL https://absynth.issb.genopole.fr/GREAT and each individual GREAT tool is available for downloading. PMID:27151196
Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics
Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed
2016-01-01
In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003
Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai
2017-01-01
Technological advances have led to an explosive growth of high-throughput functional genomic data. Exploiting the correlation among different data types, it is possible to predict one functional genomic data type from other data types. Prediction tools are valuable in understanding the relationship among different functional genomic signals. They also provide a cost-efficient solution to inferring the unknown functional genomic profiles when experimental data are unavailable due to resource or technological constraints. The predicted data may be used for generating hypotheses, prioritizing targets, interpreting disease variants, facilitating data integration, quality control, and many other purposes. This article reviews various applications of prediction methods in functional genomics, discusses analytical challenges, and highlights some common and effective strategies used to develop prediction methods for functional genomic data. PMID:28076869
RATT: Rapid Annotation Transfer Tool
Otto, Thomas D.; Dillon, Gary P.; Degrave, Wim S.; Berriman, Matthew
2011-01-01
Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net. PMID:21306991
Updating the Micro-Tom TILLING platform.
Okabe, Yoshihiro; Ariizumi, Tohru; Ezura, Hiroshi
2013-03-01
The dwarf tomato variety Micro-Tom is regarded as a model system for functional genomics studies in tomato. Various tomato genomic tools in the genetic background of Micro-Tom have been established, such as mutant collections, genome information and a metabolomic database. Recent advances in tomato genome sequencing have brought about a significant need for reverse genetics tools that are accessible to the larger community, because a great number of gene sequences have become available from public databases. To meet the requests from the tomato research community, we have developed the Micro-Tom Targeting-Induced Local Lesions IN Genomes (TILLING) platform, which is comprised of more than 5000 EMS-mutagenized lines. The platform serves as a reverse genetics tool for efficiently identifying mutant alleles in parallel with the development of Micro-Tom mutant collections. The combination of Micro-Tom mutant libraries and the TILLING approach enables researchers to accelerate the isolation of desirable mutants for unraveling gene function or breeding. To upgrade the genomic tool of Micro-Tom, the development of a new mutagenized population is underway. In this paper, the current status of the Micro-Tom TILLING platform and its future prospects are described.
Inferring transposons activity chronology by TRANScendence - TEs database and de-novo mining tool.
Startek, Michał Piotr; Nogły, Jakub; Gromadka, Agnieszka; Grzebelus, Dariusz; Gambin, Anna
2017-10-16
The constant progress in sequencing technology leads to ever increasing amounts of genomic data. In the light of current evidence transposable elements (TEs for short) are becoming useful tools for learning about the evolution of host genome. Therefore the software for genome-wide detection and analysis of TEs is of great interest. Here we describe the computational tool for mining, classifying and storing TEs from newly sequenced genomes. This is an online, web-based, user-friendly service, enabling users to upload their own genomic data, and perform de-novo searches for TEs. The detected TEs are automatically analyzed, compared to reference databases, annotated, clustered into families, and stored in TEs repository. Also, the genome-wide nesting structure of found elements are detected and analyzed by new method for inferring evolutionary history of TEs. We illustrate the functionality of our tool by performing a full-scale analyses of TE landscape in Medicago truncatula genome. TRANScendence is an effective tool for the de-novo annotation and classification of transposable elements in newly-acquired genomes. Its streamlined interface makes it well-suited for evolutionary studies.
RNA interference for functional genomics and improvement of cotton (Gossypium species)
USDA-ARS?s Scientific Manuscript database
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium ssp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function ...
Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David
2017-09-12
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.
GREAT: a web portal for Genome Regulatory Architecture Tools.
Bouyioukos, Costas; Bucchini, François; Elati, Mohamed; Képès, François
2016-07-08
GREAT (Genome REgulatory Architecture Tools) is a novel web portal for tools designed to generate user-friendly and biologically useful analysis of genome architecture and regulation. The online tools of GREAT are freely accessible and compatible with essentially any operating system which runs a modern browser. GREAT is based on the analysis of genome layout -defined as the respective positioning of co-functional genes- and its relation with chromosome architecture and gene expression. GREAT tools allow users to systematically detect regular patterns along co-functional genomic features in an automatic way consisting of three individual steps and respective interactive visualizations. In addition to the complete analysis of regularities, GREAT tools enable the use of periodicity and position information for improving the prediction of transcription factor binding sites using a multi-view machine learning approach. The outcome of this integrative approach features a multivariate analysis of the interplay between the location of a gene and its regulatory sequence. GREAT results are plotted in web interactive graphs and are available for download either as individual plots, self-contained interactive pages or as machine readable tables for downstream analysis. The GREAT portal can be reached at the following URL https://absynth.issb.genopole.fr/GREAT and each individual GREAT tool is available for downloading. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
USDA-ARS?s Scientific Manuscript database
Site-specific genome modification is an important tool for mosquito functional genomics studies that help to uncover gene functions, identify gene regulatory elements, and perform comparative gene expression studies, all of which contribute to a better understanding of mosquito biology and are thus ...
Synthetic biology: Novel approaches for microbiology.
Padilla-Vaca, Felipe; Anaya-Velázquez, Fernando; Franco, Bernardo
2015-06-01
In the past twenty years, molecular genetics has created powerful tools for genetic manipulation of living organisms. Whole genome sequencing has provided necessary information to assess knowledge on gene function and protein networks. In addition, new tools permit to modify organisms to perform desired tasks. Gene function analysis is speed up by novel approaches that couple both high throughput data generation and mining. Synthetic biology is an emerging field that uses tools for generating novel gene networks, whole genome synthesis and engineering. New applications in biotechnological, pharmaceutical and biomedical research are envisioned for synthetic biology. In recent years these new strategies have opened up the possibilities to study gene and genome editing, creation of novel tools for functional studies in virus, parasites and pathogenic bacteria. There is also the possibility to re-design organisms to generate vaccine subunits or produce new pharmaceuticals to combat multi-drug resistant pathogens. In this review we provide our opinion on the applicability of synthetic biology strategies for functional studies of pathogenic organisms and some applications such as genome editing and gene network studies to further comprehend virulence factors and determinants in pathogenic organisms. We also discuss what we consider important ethical issues for this field of molecular biology, especially for potential misuse of the new technologies. Copyright© by the Spanish Society for Microbiology and Institute for Catalan Studies.
Dereplication, Aggregation and Scoring Tool (DAS Tool) v1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
SIEBER, CHRISTIAN
Communities of uncultivated microbes are critical to ecosystem function and microorganism health, and a key objective of metagenomic studies is to analyze organism-specific metabolic pathways and reconstruct community interaction networks. This requires accurate assignment of genes to genomes, yet existing binning methods often fail to predict a reasonable number of genomes and report many bins of low quality and completeness. Furthermore, the performance of existing algorithms varies between samples and biotypes. Here, we present a dereplication, aggregation and scoring strategy, DAS Tool, that combines the strengths of a flexible set of established binning algorithms. DAS Tools applied to a constructedmore » community generated more accurate bins than any automated method. Further, when applied to samples of different complexity, including soil, natural oil seeps, and the human gut, DAS Tool recovered substantially more near-complete genomes than any single binning method alone. Included were three genomes from a novel lineage . The ability to reconstruct many near-complete genomes from metagenomics data will greatly advance genome-centric analyses of ecosystems.« less
Applications of CRISPR Genome Engineering in Cell Biology
Wang, Fangyuan; Qi, Lei S.
2016-01-01
Recent advances in genome engineering are starting a revolution in biological research and translational applications. The CRISPR-associated RNA-guided endonuclease Cas9 and its variants enable diverse manipulations of genome function. In this review, we describe the development of Cas9 tools for a variety of applications in cell biology research, including the study of functional genomics, the creation of transgenic animal models, and genomic imaging. Novel genome engineering methods offer a new avenue to understand the causality between genome and phenotype, thus promising a fuller understanding of cell biology. PMID:27599850
Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique
2016-03-16
Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loots, G G; Ovcharenko, I; Collette, N
2007-02-26
Generating the sequence of the human genome represents a colossal achievement for science and mankind. The technical use for the human genome project information holds great promise to cure disease, prevent bioterror threats, as well as to learn about human origins. Yet converting the sequence data into biological meaningful information has not been immediately obvious, and we are still in the preliminary stages of understanding how the genome is organized, what are the functional building blocks and how do these sequences mediate complex biological processes. The overarching goal of this program was to develop novel methods and high throughput strategiesmore » for determining the functions of ''anonymous'' human genes that are evolutionarily deeply conserved in other vertebrates. We coupled analytical tool development and computational predictions regarding gene function with novel high throughput experimental strategies and tested biological predictions in the laboratory. The tools required for comparative genomic data-mining are fundamentally the same whether they are applied to scientific studies of related microbes or the search for functions of novel human genes. For this reason the tools, conceptual framework and the coupled informatics-experimental biology paradigm we developed in this LDRD has many potential scientific applications relevant to LLNL multidisciplinary research in bio-defense, bioengineering, bionanosciences and microbial and environmental genomics.« less
User Guidelines for the Brassica Database: BRAD.
Wang, Xiaobo; Cheng, Feng; Wang, Xiaowu
2016-01-01
The genome sequence of Brassica rapa was first released in 2011. Since then, further Brassica genomes have been sequenced or are undergoing sequencing. It is therefore necessary to develop tools that help users to mine information from genomic data efficiently. This will greatly aid scientific exploration and breeding application, especially for those with low levels of bioinformatic training. Therefore, the Brassica database (BRAD) was built to collect, integrate, illustrate, and visualize Brassica genomic datasets. BRAD provides useful searching and data mining tools, and facilitates the search of gene annotation datasets, syntenic or non-syntenic orthologs, and flanking regions of functional genomic elements. It also includes genome-analysis tools such as BLAST and GBrowse. One of the important aims of BRAD is to build a bridge between Brassica crop genomes with the genome of the model species Arabidopsis thaliana, thus transferring the bulk of A. thaliana gene study information for use with newly sequenced Brassica crops.
PGMapper: a web-based tool linking phenotype to genes.
Xiong, Qing; Qiu, Yuhui; Gu, Weikuan
2008-04-01
With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.
GAP Final Technical Report 12-14-04
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrew J. Bordner, PhD, Senior Research Scientist
2004-12-14
The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less
Analysis tools for the interplay between genome layout and regulation.
Bouyioukos, Costas; Elati, Mohamed; Képès, François
2016-06-06
Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.
USDA-ARS?s Scientific Manuscript database
Although well-accepted as the ultimate method for cotton functional genomics, Agrobacterium tumefaciens-mediated cotton transformation is not widely used for functional analyses of cotton genes and their promoters since regeneration of cotton in tissue culture is lengthy and labor intensive. In cer...
McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick
2007-01-01
The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.
Methods of epigenome editing for probing the function of genomic imprinting.
Rienecker, Kira DA; Hill, Matthew J; Isles, Anthony R
2016-10-01
The curious patterns of imprinted gene expression draw interest from several scientific disciplines to the functional consequences of genomic imprinting. Methods of probing the function of imprinting itself have largely been indirect and correlational, relying heavily on conventional transgenics. Recently, the burgeoning field of epigenome editing has provided new tools and suggested strategies for asking causal questions with site specificity. This perspective article aims to outline how these new methods may be applied to questions of functional imprinting and, with this aim in mind, to suggest new dimensions for the expansion of these epigenome-editing tools.
EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome
Thibaud-Nissen, Françoise; Campbell, Matthew; Hamilton, John P; Zhu, Wei; Buell, C Robin
2007-01-01
Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website , as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at . PMID:17961238
[Short interspersed repetitive sequences (SINEs) and their use as a phylogenetic tool].
Kramerov, D A; Vasetskiĭ, N S
2009-01-01
The data on one of the most common repetitive elements of eukaryotic genomes, short interspersed elements (SINEs), are reviewed. Their structure, origin, and functioning in the genome are discussed. The variation and abundance of these neutral genomic markers makes them a convenient and reliable tool for phylogenetic analysis. The main methods of such analysis are presented, and the potential and limitations of this approach are discussed using specific examples.
Resources for Functional Genomics Studies in Drosophila melanogaster
Mohr, Stephanie E.; Hu, Yanhui; Kim, Kevin; Housden, Benjamin E.; Perrimon, Norbert
2014-01-01
Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity
Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D
2006-01-01
Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
The most common technologies and tools for functional genome analysis.
Gasperskaja, Evelina; Kučinskas, Vaidutis
2017-01-01
Since the sequence of the human genome is complete, the main issue is how to understand the information written in the DNA sequence. Despite numerous genome-wide studies that have already been performed, the challenge to determine the function of genes, gene products, and also their interaction is still open. As changes in the human genome are highly likely to cause pathological conditions, functional analysis is vitally important for human health. For many years there have been a variety of technologies and tools used in functional genome analysis. However, only in the past decade there has been rapid revolutionizing progress and improvement in high-throughput methods, which are ranging from traditional real-time polymerase chain reaction to more complex systems, such as next-generation sequencing or mass spectrometry. Furthermore, not only laboratory investigation, but also accurate bioinformatic analysis is required for reliable scientific results. These methods give an opportunity for accurate and comprehensive functional analysis that involves various fields of studies: genomics, epigenomics, proteomics, and interactomics. This is essential for filling the gaps in the knowledge about dynamic biological processes at both cellular and organismal level. However, each method has both advantages and limitations that should be taken into account before choosing the right method for particular research in order to ensure successful study. For this reason, the present review paper aims to describe the most frequent and widely-used methods for the comprehensive functional analysis.
AGORA : Organellar genome annotation from the amino acid and nucleotide references.
Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman
2018-03-29
Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.
Stagonospora nodorum: From pathology to genomics and host resistance
USDA-ARS?s Scientific Manuscript database
Stagonospora nodorum is a major necrotrophic pathogen of wheat that causes the diseases Stagonospora nodorum leaf and glume blotch. A series of tools and resources, including functional genomics, a genome sequence, proteomics and metabolomics, host-mapping populations, and a worldwide collection of ...
Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu
2015-05-27
Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
Development and characterization of rice mutants for functional genomic studies and breeding
USDA-ARS?s Scientific Manuscript database
Mutagenesis is a powerful tool for creating genetic materials for studying functional genomics, breeding, and understanding the molecular basis of disease resistance. Approximately 100,000 putative mutants of rice (Oryza sativa L.) have been generated with mutagens. Numerous mutant genes involved in...
Genomics as the key to unlocking the polyploid potential of wheat.
Borrill, Philippa; Adamski, Nikolai; Uauy, Cristobal
2015-12-01
Polyploidy has played a central role in plant genome evolution and in the formation of new species such as tetraploid pasta wheat and hexaploid bread wheat. Until recently, the high sequence conservation between homoeologous genes, together with the large genome size of polyploid wheat, had hindered genomic analyses in this important crop species. In the past 5 yr, however, the advent of next-generation sequencing has radically changed the wheat genomics landscape. Here, we review a series of advances in genomic resources and tools for functional genomics that are shifting the paradigm of what is possible in wheat molecular genetics and breeding. We discuss how understanding the relationship between homoeologues can inform approaches to modulate the response of quantitative traits in polyploid wheat; we also argue that functional redundancy has 'locked up' a wide range of phenotypic variation in wheat. We explore how genomics provides key tools to inform targeted manipulation of multiple homoeologues, thereby allowing researchers and plant breeders to unlock the full polyploid potential of wheat. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigoriev, Igor
The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scalemore » genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.« less
Applications of CRISPR Genome Engineering in Cell Biology.
Wang, Fangyuan; Qi, Lei S
2016-11-01
Recent advances in genome engineering are starting a revolution in biological research and translational applications. The clustered regularly interspaced short palindromic repeats (CRISPR)-associated RNA-guided endonuclease CRISPR associated protein 9 (Cas9) and its variants enable diverse manipulations of genome function. In this review, we describe the development of Cas9 tools for a variety of applications in cell biology research, including the study of functional genomics, the creation of transgenic animal models, and genomic imaging. Novel genome engineering methods offer a new avenue to understand the causality between the genome and phenotype, thus promising a fuller understanding of cell biology. Copyright © 2016 Elsevier Ltd. All rights reserved.
Yazaki, Junshi; Kikuchi, Shoshi
2005-01-01
We now have the various genomics tools for monocot (Oryza sativa) and a dicot (Arabidopsis thaliana) plant. Plant is not only a very important agricultural resource but also a model organism for biological research. It is important that the interaction between ABA and GA is investigated for controlling the transition from embryogenesis to germination in seeds using genomics tools. These studies have investigated the relationship between dormancy and germination using genomics tools. Genomics tools identified genes that had never before been annotated as ABA- or GA-responsive genes in plant, detected new interactions between genes responsive to the two hormones, comprehensively characterized cis-elements of hormone-responsive genes, and characterized cis-elements of rice and Arabidopsis. In these research, ABA- and GA-regulated genes have been classified as functional proteins (proteins that probably function in stress or PR tolerance) and regulatory proteins (protein factors involved in further regulation of signal transduction). Comparison between ABA and/or GA-responsive genes in rice and those in Arabidopsis has shown that the cis-element has specificity in each species. cis-Elements for the dehydration-stress response have been specified in Arabidopsis but not in rice. cis-Elements for protein storage are remarkably richer in the upstream regions of the rice gene than in those of Arabidopsis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigoriev, Igor
Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 150 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supportedmore » by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less
A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis
Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan
2009-01-01
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301
MaGnET: Malaria Genome Exploration Tool.
Sharman, Joanna L; Gerloff, Dietlind L
2013-09-15
The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive 'exploration-style' visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein-protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary data are available at Bioinformatics online.
IMG ER: a system for microbial genome annotation expert review and curation.
Markowitz, Victor M; Mavromatis, Konstantinos; Ivanova, Natalia N; Chen, I-Min A; Chu, Ken; Kyrpides, Nikos C
2009-09-01
A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.
Genetic, genomic, and molecular tools for studying the protoploid yeast, L. waltii.
Di Rienzi, Sara C; Lindstrom, Kimberly C; Lancaster, Ragina; Rolczynski, Lisa; Raghuraman, M K; Brewer, Bonita J
2011-02-01
Sequencing of the yeast Kluyveromyces waltii (recently renamed Lachancea waltii) provided evidence of a whole genome duplication event in the lineage leading to the well-studied Saccharomyces cerevisiae. While comparative genomic analyses of these yeasts have proven to be extremely instructive in modeling the loss or maintenance of gene duplicates, experimental tests of the ramifications following such genome alterations remain difficult. To transform L. waltii from an organism of the computational comparative genomic literature into an organism of the functional comparative genomic literature, we have developed genetic, molecular and genomic tools for working with L. waltii. In particular, we have characterized basic properties of L. waltii (growth, ploidy, molecular karyotype, mating type and the sexual cycle), developed transformation, cell cycle arrest and synchronization protocols, and have created centromeric and non-centromeric vectors as well as a genome browser for L. waltii. We hope that these tools will be used by the community to follow up on the ideas generated by sequence data and lead to a greater understanding of eukaryotic biology and genome evolution. 2010 John Wiley & Sons, Ltd.
Genetic, genomic, and molecular tools for studying the protoploid yeast, L. waltii
Di Rienzi, Sara C.; Lindstrom, Kimberly C.; Lancaster, Ragina; Rolczynski, Lisa; Raghuraman, M. K.; Brewer, Bonita J.
2011-01-01
Sequencing of the yeast Kluyveromyces waltii (recently renamed Lachancea waltii) provided evidence of a whole genome duplication event in the lineage leading to the well-studied Saccharomyces cerevisiae. While comparative genomic analyses of these yeasts have proven to be extremely instructive in modeling the loss or maintenance of gene duplicates, experimental tests of the ramifications following such genome alterations remain difficult. To transform L. waltii from an organism of the computational comparative genomic literature into an organism of the functional comparative genomic literature, we have developed genetic, molecular and genomic tools for working with L. waltii. In particular, we have characterized basic properties of L. waltii (growth, ploidy, molecular karyotype, mating type and the sexual cycle), developed transformation, cell cycle arrest and synchronization protocols, and have created centromeric and non-centromeric vectors as well as a genome browser for L. waltii. We hope that these tools will be used by the community to follow up on the ideas generated by sequence data and lead to a greater understanding of eukaryotic biology and genome evolution. PMID:21246627
Hu, Yanhui; Comjean, Aram; Roesel, Charles; Vinayagam, Arunachalam; Flockhart, Ian; Zirin, Jonathan; Perkins, Lizabeth; Perrimon, Norbert; Mohr, Stephanie E.
2017-01-01
The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches. PMID:27924039
CoryneBase: Corynebacterium Genomic Resources and Analysis Tools at Your Fingertips
Tan, Mui Fern; Jakubovics, Nick S.; Wee, Wei Yee; Mutha, Naresh V. R.; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam; Choo, Siew Woh
2014-01-01
Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/. PMID:24466021
Toward mapping the biology of the genome.
Chanock, Stephen
2012-09-01
This issue of Genome Research presents new results, methods, and tools from The ENCODE Project (ENCyclopedia of DNA Elements), which collectively represents an important step in moving beyond a parts list of the genome and promises to shape the future of genomic research. This collection sheds light on basic biological questions and frames the current debate over the optimization of tools and methodological challenges necessary to compare and interpret large complex data sets focused on how the genome is organized and regulated. In a number of instances, the authors have highlighted the strengths and limitations of current computational and technical approaches, providing the community with useful standards, which should stimulate development of new tools. In many ways, these papers will ripple through the scientific community, as those in pursuit of understanding the "regulatory genome" will heavily traverse the maps and tools. Similarly, the work should have a substantive impact on how genetic variation contributes to specific diseases and traits by providing a compendium of functional elements for follow-up study. The success of these papers should not only be measured by the scope of the scientific insights and tools but also by their ability to attract new talent to mine existing and future data.
Gobe: an interactive, web-based tool for comparative genomic visualization.
Pedersen, Brent S; Tang, Haibao; Freeling, Michael
2011-04-01
Gobe is a web-based tool for viewing comparative genomic data. It supports viewing multiple genomic regions simultaneously. Its simple text format and flash-based rendering make it an interactive, exploratory research tool. Gobe can be used without installation through our web service, or downloaded and customized with stylesheets and javascript callback functions. Gobe is a flash application that runs in all modern web-browsers. The full source-code, including that for the online web application is available under the MIT license at: http://github.com/brentp/gobe. Sample applications are hosted at http://try-gobe.appspot.com/ and http://synteny.cnr.berkeley.edu/gobe-app/.
Expanded microbial genome coverage and improved protein family annotation in the COG database
Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365
The CRISPR/Cas Genome-Editing Tool: Application in Improvement of Crops
Khatodia, Surender; Bhatotia, Kirti; Passricha, Nishat; Khurana, S. M. P.; Tuteja, Narendra
2016-01-01
The Clustered Regularly Interspaced Short Palindromic Repeats associated Cas9/sgRNA system is a novel targeted genome-editing technique derived from bacterial immune system. It is an inexpensive, easy, most user friendly and rapidly adopted genome editing tool transforming to revolutionary paradigm. This technique enables precise genomic modifications in many different organisms and tissues. Cas9 protein is an RNA guided endonuclease utilized for creating targeted double-stranded breaks with only a short RNA sequence to confer recognition of the target in animals and plants. Development of genetically edited (GE) crops similar to those developed by conventional or mutation breeding using this potential technique makes it a promising and extremely versatile tool for providing sustainable productive agriculture for better feeding of rapidly growing population in a changing climate. The emerging areas of research for the genome editing in plants include interrogating gene function, rewiring the regulatory signaling networks and sgRNA library for high-throughput loss-of-function screening. In this review, we have described the broad applicability of the Cas9 nuclease mediated targeted plant genome editing for development of designer crops. The regulatory uncertainty and social acceptance of plant breeding by Cas9 genome editing have also been described. With this powerful and innovative technique the designer GE non-GM plants could further advance climate resilient and sustainable agriculture in the future and maximizing yield by combating abiotic and biotic stresses. PMID:27148329
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-01
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. PMID:27899624
A genome-wide 20 K citrus microarray for gene expression analysis
Martinez-Godoy, M Angeles; Mauri, Nuria; Juarez, Jose; Marques, M Carmen; Santiago, Julia; Forment, Javier; Gadea, Jose
2008-01-01
Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database [1] was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability. Conclusion This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global studies in citrus by using it to catalogue genes expressed in citrus globular embryos. PMID:18598343
IMG 4 version of the integrated microbial genomes comparative analysis system
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.
2014-01-01
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883
IMG 4 version of the integrated microbial genomes comparative analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts providemore » support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).« less
IMG 4 version of the integrated microbial genomes comparative analysis system.
Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C
2014-01-01
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).
Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan
2005-06-01
Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.
PGSB PlantsDB: updates to the database framework for comparative plant genome research.
Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X
2016-01-04
PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.
Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C
2009-11-24
Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.
Microbial genome analysis: the COG approach.
Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2017-09-14
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Automatic Tool for Local Assembly Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whole community shotgun sequencing of total DNA (i.e. metagenomics) and total RNA (i.e. metatranscriptomics) has provided a wealth of information in the microbial community structure, predicted functions, metabolic networks, and is even able to reconstruct complete genomes directly. Here we present ATLAS (Automatic Tool for Local Assembly Structures) a comprehensive pipeline for assembly, annotation, genomic binning of metagenomic and metatranscriptomic data with an integrated framework for Multi-Omics. This will provide an open source tool for the Multi-Omic community at large.
Genetic resources offer efficient tools for rice functional genomics research.
Lo, Shuen-Fang; Fan, Ming-Jen; Hsing, Yue-Ie; Chen, Liang-Jwu; Chen, Shu; Wen, Ien-Chie; Liu, Yi-Lun; Chen, Ku-Ting; Jiang, Mirng-Jier; Lin, Ming-Kuang; Rao, Meng-Yen; Yu, Lin-Chih; Ho, Tuan-Hua David; Yu, Su-May
2016-05-01
Rice is an important crop and major model plant for monocot functional genomics studies. With the establishment of various genetic resources for rice genomics, the next challenge is to systematically assign functions to predicted genes in the rice genome. Compared with the robustness of genome sequencing and bioinformatics techniques, progress in understanding the function of rice genes has lagged, hampering the utilization of rice genes for cereal crop improvement. The use of transfer DNA (T-DNA) insertional mutagenesis offers the advantage of uniform distribution throughout the rice genome, but preferentially in gene-rich regions, resulting in direct gene knockout or activation of genes within 20-30 kb up- and downstream of the T-DNA insertion site and high gene tagging efficiency. Here, we summarize the recent progress in functional genomics using the T-DNA-tagged rice mutant population. We also discuss important features of T-DNA activation- and knockout-tagging and promoter-trapping of the rice genome in relation to mutant and candidate gene characterizations and how to more efficiently utilize rice mutant populations and datasets for high-throughput functional genomics and phenomics studies by forward and reverse genetics approaches. These studies may facilitate the translation of rice functional genomics research to improvements of rice and other cereal crops. © 2015 John Wiley & Sons Ltd.
Strategies to explore functional genomics data sets in NCBI's GEO database.
Wilhite, Stephen E; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database
Wilhite, Stephen E.; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
Christen, Matthias; Deutsch, Samuel; Christen, Beat
2015-08-21
Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher .
Improved maize reference genome with single-molecule technologies
USDA-ARS?s Scientific Manuscript database
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate elucidation of biological processes and support translation of research findings into improved and sustainable agricultural technolog...
COGNAT: a web server for comparative analysis of genomic neighborhoods.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
2017-11-22
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
Functional genomic approaches for understanding the mode of action of Bacillus sp biocontrol strains
USDA-ARS?s Scientific Manuscript database
Complete genome sequencing of several Bacillus sp. strains has shed new light on the mode of action of these antagonists of plant pathogens. The use of genomic data mining tools provided the ability to quickly determine the potential of these strains to produce bioactive secondary metabolites. Our B...
USDA-ARS?s Scientific Manuscript database
Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we ...
A Guide to the PLAZA 3.0 Plant Comparative Genomic Database.
Vandepoele, Klaas
2017-01-01
PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigoriev, Igor V.
2011-03-14
Genomes of energy and environment fungi are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 50 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functionalmore » genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such 'parts' suggested by comparative genomics and functional analysis in these areas are presented here« less
Periwal, Vinita
2017-07-01
Genome editing with engineered nucleases (zinc finger nucleases, TAL effector nucleases s and Clustered regularly inter-spaced short palindromic repeats/CRISPR-associated) has recently been shown to have great promise in a variety of therapeutic and biotechnological applications. However, their exploitation in genetic analysis and clinical settings largely depends on their specificity for the intended genomic target. Large and complex genomes often contain highly homologous/repetitive sequences, which limits the specificity of genome editing tools and could result in off-target activity. Over the past few years, various computational approaches have been developed to assist the design process and predict/reduce the off-target activity of these nucleases. These tools could be efficiently used to guide the design of constructs for engineered nucleases and evaluate results after genome editing. This review provides a comprehensive overview of various databases, tools, web servers and resources for genome editing and compares their features and functionalities. Additionally, it also describes tools that have been developed to analyse post-genome editing results. The article also discusses important design parameters that could be considered while designing these nucleases. This review is intended to be a quick reference guide for experimentalists as well as computational biologists working in the field of genome editing with engineered nucleases. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
MIPS: a database for genomes and protein sequences
Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.
Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
RGmatch: matching genomic regions to proximal genes in omics data integration.
Furió-Tarí, Pedro; Conesa, Ana; Tarazona, Sonia
2016-11-22
The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area) is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher's specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch's flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.
RNAi for functional genomics in plants.
McGinnis, Karen M
2010-03-01
RNAi refers to several different types of gene silencing mediated by small, dsRNA molecules. Over the course of 20 years, the scientific understanding of RNAi has developed from the initial observation of unexpected expression patterns to a sophisticated understanding of a multi-faceted, evolutionarily conserved network of mechanisms that regulate gene expression in many organisms. It has also been developed as a genetic tool that can be exploited in a wide range of species. Because transgene-induced RNAi has been effective at silencing one or more genes in a wide range of plants, this technology also bears potential as a powerful functional genomics tool across the plant kingdom. Transgene-induced RNAi has indeed been shown to be an effective mechanism for silencing many genes in many organisms, but the results from multiple projects which attempted to exploit RNAi on a genome-wide scale suggest that there is a great deal of variation in the silencing efficacy between transgenic events, silencing targets and silencing-induced phenotype. The results from these projects indicate several important variables that should be considered in experimental design prior to the initiation of functional genomics efforts based on RNAi silencing. In recent years, alternative strategies have been developed for targeted gene silencing, and a combination of approaches may also enhance the use of targeted gene silencing for functional genomics.
Gene calling and bacterial genome annotation with BG7.
Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo
2015-01-01
New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
CRISPR Mediated Genome Engineering and its Application in Industry.
Kaboli, Saeed; Babazada, Hasan
2018-01-01
The CRISPR (clustered regularly interspaced short palindromic repeat)-Cas9 (CRISPR-associated nuclease 9) method has been dramatically changing the field of genome engineering. It is a rapid, highly efficient and versatile tool for precise modification of genome that uses a guide RNA (gRNA) to target Cas9 to a specific sequence. This novel RNA-guided genome-editing technique has become a revolutionary tool in biomedical science and has many innovative applications in different fields. In this review, we briefly introduce the Cas9-mediated genome-editing tool, summarize the recent advances in CRISPR/Cas9 technology to engineer the genomes of a wide variety of organisms, and discuss their applications to treatment of fungal and viral disease. We also discuss advantageous of CRISPR/Cas9 technology to drug design, creation of animal model, and to food, agricultural and energy sciences. Adoption of the CRISPR/Cas9 technology in biomedical and biotechnological researches would create innovative applications of it not only for breeding of strains exhibiting desired traits for specific industrial and medical applications, but also for investigation of genome function.
CRISPR-Cas9 for medical genetic screens: applications and future perspectives.
Xue, Hui-Ying; Ji, Li-Juan; Gao, Ai-Mei; Liu, Ping; He, Jing-Dong; Lu, Xiao-Jie
2016-02-01
CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-CRISPR associated nuclease 9) systems have emerged as versatile and convenient (epi)genome editing tools and have become an important player in medical genetic research. CRISPR-Cas9 and its variants such as catalytically inactivated Cas9 (dead Cas9, dCas9) and scaffold-incorporating single guide sgRNA (scRNA) have been applied in various genomic screen studies. CRISPR screens enable high-throughput interrogation of gene functions in health and diseases. Compared with conventional RNAi screens, CRISPR screens incur less off-target effects and are more versatile in that they can be used in multiple formats such as knockout, knockdown and activation screens, and can target coding and non-coding regions throughout the genome. This powerful screen platform holds the potential of revolutionising functional genomic studies in the near future. Herein, we introduce the mechanisms of (epi)genome editing mediated by CRISPR-Cas9 and its variants, introduce the procedures and applications of CRISPR screen in functional genomics, compare it with conventional screen tools and at last discuss current challenges and opportunities and propose future directions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-04
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes.
Hallin, Peter F; Stærfeldt, Hans-Henrik; Rotenberg, Eva; Binnewies, Tim T; Benham, Craig J; Ussery, David W
2009-09-25
We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.
MaGnET: Malaria Genome Exploration Tool
Sharman, Joanna L.; Gerloff, Dietlind L.
2013-01-01
Summary: The Malaria Genome Exploration Tool (MaGnET) is a software tool enabling intuitive ‘exploration-style’ visualization of functional genomics data relating to the malaria parasite, Plasmodium falciparum. MaGnET provides innovative integrated graphic displays for different datasets, including genomic location of genes, mRNA expression data, protein–protein interactions and more. Any selection of genes to explore made by the user is easily carried over between the different viewers for different datasets, and can be changed interactively at any point (without returning to a search). Availability and Implementation: Free online use (Java Web Start) or download (Java application archive and MySQL database; requires local MySQL installation) at http://malariagenomeexplorer.org Contact: joanna.sharman@ed.ac.uk or dgerloff@ffame.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23894142
The COG database: a tool for genome-scale analysis of protein functions and evolution
Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.
2000-01-01
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leung, Elo; Huang, Amy; Cadag, Eithon
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Leung, Elo; Huang, Amy; Cadag, Eithon; ...
2016-01-20
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Fungal Genomics for Energy and Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigoriev, Igor V.
2013-03-11
Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Sequencing Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for usersmore » to nominate new species for sequencing. Over 200 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less
ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function
Moraes, Walas Jhony Lopes; Rodrigues, Thiago de Souza; Bartholomeu, Daniella Castanheira
2015-01-01
Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. PMID:25811026
G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures.
Solomon, Oz; Kunik, Vered; Simon, Amos; Kol, Nitzan; Barel, Ortal; Lev, Atar; Amariglio, Ninette; Somech, Raz; Rechavi, Gidi; Eyal, Eran
2016-08-26
Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D . G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.
Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar
2007-01-01
MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813
Functional assessment of human enhancer activities using whole-genome STARR-sequencing.
Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P
2017-11-20
Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome. In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.
Genetic screens and functional genomics using CRISPR/Cas9 technology.
Hartenian, Ella; Doench, John G
2015-04-01
Functional genomics attempts to understand the genome by perturbing the flow of information from DNA to RNA to protein, in order to learn how gene dysfunction leads to disease. CRISPR/Cas9 technology is the newest tool in the geneticist's toolbox, allowing researchers to edit DNA with unprecedented ease, speed and accuracy, and representing a novel means to perform genome-wide genetic screens to discover gene function. In this review, we first summarize the discovery and characterization of CRISPR/Cas9, and then compare it to other genome engineering technologies. We discuss its initial use in screening applications, with a focus on optimizing on-target activity and minimizing off-target effects. Finally, we comment on future challenges and opportunities afforded by this technology. © 2015 FEBS.
Brachypodium distachyon genetic resources
USDA-ARS?s Scientific Manuscript database
Brachypodium distachyon is a well-established model species for the grass family Poaceae. It possesses an array of features that make it suited for this purpose, including a small sequenced genome, simple transformation methods, and additional functional genomics tools. However, the most critical to...
Fueling the Future with Fungal Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigoriev, Igor V.
2014-10-27
Genomes of fungi relevant to energy and environment are in focus of the JGI Fungal Genomic Program. One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts and pathogens) and biorefinery processes (cellulose degradation and sugar fermentation) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Science Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for users to nominate new species for sequencing. Over 400 fungal genomes have beenmore » sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics will lead to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such ‘parts’ suggested by comparative genomics and functional analysis in these areas are presented here.« less
Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.
Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup
2011-09-01
The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.
Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses.
Talukder, Shyamal K; Saha, Malay C
2017-01-01
Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS) are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs). Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder.
Expanded microbial genome coverage and improved protein family annotation in the COG database.
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
AncestrySNPminer: A bioinformatics tool to retrieve and develop ancestry informative SNP panels
Amirisetty, Sushil; Khurana Hershey, Gurjit K.; Baye, Tesfaye M.
2012-01-01
A wealth of genomic information is available in public and private databases. However, this information is underutilized for uncovering population specific and functionally relevant markers underlying complex human traits. Given the huge amount of SNP data available from the annotation of human genetic variation, data mining is a faster and cost effective approach for investigating the number of SNPs that are informative for ancestry. In this study, we present AncestrySNPminer, the first web-based bioinformatics tool specifically designed to retrieve Ancestry Informative Markers (AIMs) from genomic data sets and link these informative markers to genes and ontological annotation classes. The tool includes an automated and simple “scripting at the click of a button” functionality that enables researchers to perform various population genomics statistical analyses methods with user friendly querying and filtering of data sets across various populations through a single web interface. AncestrySNPminer can be freely accessed at https://research.cchmc.org/mershalab/AncestrySNPminer/login.php. PMID:22584067
bwtool: a tool for bigWig files
Pohl, Andy; Beato, Miguel
2014-01-01
BigWig files are a compressed, indexed, binary format for genome-wide signal data for calculations (e.g. GC percent) or experiments (e.g. ChIP-seq/RNA-seq read depth). bwtool is a tool designed to read bigWig files rapidly and efficiently, providing functionality for extracting data and summarizing it in several ways, globally or at specific regions. Additionally, the tool enables the conversion of the positions of signal data from one genome assembly to another, also known as ‘lifting’. We believe bwtool can be useful for the analyst frequently working with bigWig data, which is becoming a standard format to represent functional signals along genomes. The article includes supplementary examples of running the software. Availability and implementation: The C source code is freely available under the GNU public license v3 at http://cromatina.crg.eu/bwtool. Contact: andrew.pohl@crg.eu, andypohl@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24489365
Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi
2013-04-10
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
Rastogi, Achal; Murik, Omer; Bowler, Chris; Tirichine, Leila
2016-07-01
With the emerging interest in phytoplankton research, the need to establish genetic tools for the functional characterization of genes is indispensable. The CRISPR/Cas9 system is now well recognized as an efficient and accurate reverse genetic tool for genome editing. Several computational tools have been published allowing researchers to find candidate target sequences for the engineering of the CRISPR vectors, while searching possible off-targets for the predicted candidates. These tools provide built-in genome databases of common model organisms that are used for CRISPR target prediction. Although their predictions are highly sensitive, the applicability to non-model genomes, most notably protists, makes their design inadequate. This motivated us to design a new CRISPR target finding tool, PhytoCRISP-Ex. Our software offers CRIPSR target predictions using an extended list of phytoplankton genomes and also delivers a user-friendly standalone application that can be used for any genome. The software attempts to integrate, for the first time, most available phytoplankton genomes information and provide a web-based platform for Cas9 target prediction within them with high sensitivity. By offering a standalone version, PhytoCRISP-Ex maintains an independence to be used with any organism and widens its applicability in high throughput pipelines. PhytoCRISP-Ex out pars all the existing tools by computing the availability of restriction sites over the most probable Cas9 cleavage sites, which can be ideal for mutant screens. PhytoCRISP-Ex is a simple, fast and accurate web interface with 13 pre-indexed and presently updating phytoplankton genomes. The software was also designed as a UNIX-based standalone application that allows the user to search for target sequences in the genomes of a variety of other species.
Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano
2004-01-01
The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464
Improving Microbial Genome Annotations in an Integrated Database Context
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.
2013-01-01
Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
Current status of genome editing in vector mosquitoes: A review.
Reegan, Appadurai Daniel; Ceasar, Stanislaus Antony; Paulraj, Michael Gabriel; Ignacimuthu, Savarimuthu; Al-Dhabi, Naif Abdullah
2017-01-16
Mosquitoes pose a major threat to human health as they spread many deadly diseases like malaria, dengue, chikungunya, filariasis, Japanese encephalitis and Zika. Identification and use of novel molecular tools are essential to combat the spread of vector borne diseases. Genome editing tools have been used for the precise alterations of the gene of interest for producing the desirable trait in mosquitoes. Deletion of functional genes or insertion of toxic genes in vector mosquitoes will produce either knock-out or knock-in mutants that will check the spread of vector-borne diseases. Presently, three types of genome editing tools viz., zinc finger nuclease (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regulatory interspaced short palindromic repeats (CRISPR) and CRISPR associated protein 9 (Cas9) are widely used for the editing of the genomes of diverse organisms. These tools are also applied in vector mosquitoes to control the spread of vector-borne diseases. A few studies have been carried out on genome editing to control the diseases spread by vector mosquitoes and more studies need to be performed with the utilization of more recently invented tools like CRISPR/Cas9 to combat the spread of deadly diseases by vector mosquitoes. The high specificity and flexibility of CRISPR/Cas9 system may offer possibilities for novel genome editing for the control of important diseases spread by vector mosquitoes. In this review, we present the current status of genome editing research on vector mosquitoes and also discuss the future applications of vector mosquito genome editing to control the spread of vectorborne diseases.
Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.
Carretero-Paulet, Lorenzo; Albert, Victor A
2016-01-01
The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.
Thakur, Shalabh; Guttman, David S
2016-06-30
Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .
Kisand, Veljo; Lettieri, Teresa
2013-04-01
De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.
Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.
Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J
2017-01-01
The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
eShadow: A tool for comparing closely related sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.
2004-01-15
Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less
Genome Editing Redefines Precision Medicine in the Cardiovascular Field
Lahm, Harald; Dreßen, Martina; Lange, Rüdiger; Wu, Sean M.; Krane, Markus
2018-01-01
Genome editing is a powerful tool to study the function of specific genes and proteins important for development or disease. Recent technologies, especially CRISPR/Cas9 which is characterized by convenient handling and high precision, revolutionized the field of genome editing. Such tools have enormous potential for basic science as well as for regenerative medicine. Nevertheless, there are still several hurdles that have to be overcome, but patient-tailored therapies, termed precision medicine, seem to be within reach. In this review, we focus on the achievements and limitations of genome editing in the cardiovascular field. We explore different areas of cardiac research and highlight the most important developments: (1) the potential of genome editing in human pluripotent stem cells in basic research for disease modelling, drug screening, or reprogramming approaches and (2) the potential and remaining challenges of genome editing for regenerative therapies. Finally, we discuss social and ethical implications of these new technologies. PMID:29731778
Adapting CRISPR/Cas9 for functional genomics screens.
Malina, Abba; Katigbak, Alexandra; Cencic, Regina; Maïga, Rayelle Itoua; Robert, Francis; Miura, Hisashi; Pelletier, Jerry
2014-01-01
The use of CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated protein) for targeted genome editing has been widely adopted and is considered a "game changing" technology. The ease and rapidity by which this approach can be used to modify endogenous loci in a wide spectrum of cell types and organisms makes it a powerful tool for customizable genetic modifications as well as for large-scale functional genomics. The development of retrovirus-based expression platforms to simultaneously deliver the Cas9 nuclease and single guide (sg) RNAs provides unique opportunities by which to ensure stable and reproducible expression of the editing tools and a broad cell targeting spectrum, while remaining compatible with in vivo genetic screens. Here, we describe methods and highlight considerations for designing and generating sgRNA libraries in all-in-one retroviral vectors for such applications.
Using biotechnology and genomics to improve biotic and abiotic stress in apple
USDA-ARS?s Scientific Manuscript database
Genomic sequencing, molecular biology, and transformation technologies are providing valuable tools to better understand the complexity of how plants develop, function, and respond to biotic and abiotic stress. These approaches should complement but not replace a solid understanding of whole plant ...
Functional precision cancer medicine-moving beyond pure genomics.
Letai, Anthony
2017-09-08
The essential job of precision medicine is to match the right drugs to the right patients. In cancer, precision medicine has been nearly synonymous with genomics. However, sobering recent studies have generally shown that most patients with cancer who receive genomic testing do not benefit from a genomic precision medicine strategy. Although some call the entire project of precision cancer medicine into question, I suggest instead that the tools employed must be broadened. Instead of relying exclusively on big data measurements of initial conditions, we should also acquire highly actionable functional information by perturbing-for example, with cancer therapies-viable primary tumor cells from patients with cancer.
Next generation tools for genomic data generation, distribution, and visualization
2010-01-01
Background With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Results Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx); an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub); and a standalone Java Swing application (GWrap) that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. Conclusions These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq. PMID:20828407
Nussbaumer, Thomas; Kugler, Karl G; Schweiger, Wolfgang; Bader, Kai C; Gundlach, Heidrun; Spannagl, Manuel; Poursarebani, Naser; Pfeifer, Matthias; Mayer, Klaus F X
2014-12-10
Over the last years reference genome sequences of several economically and scientifically important cereals and model plants became available. Despite the agricultural significance of these crops only a small number of tools exist that allow users to inspect and visualize the genomic position of genes of interest in an interactive manner. We present chromoWIZ, a web tool that allows visualizing the genomic positions of relevant genes and comparing these data between different plant genomes. Genes can be queried using gene identifiers, functional annotations, or sequence homology in four grass species (Triticum aestivum, Hordeum vulgare, Brachypodium distachyon, Oryza sativa). The distribution of the anchored genes is visualized along the chromosomes by using heat maps. Custom gene expression measurements, differential expression information, and gene-to-group mappings can be uploaded and can be used for further filtering. This tool is mainly designed for breeders and plant researchers, who are interested in the location and the distribution of candidate genes as well as in the syntenic relationships between different grass species. chromoWIZ is freely available and online accessible at http://mips.helmholtz-muenchen.de/plant/chromoWIZ/index.jsp.
pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.
Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael
2013-08-01
With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.
Lee, Taein; Cheng, Chun-Huai; Ficklin, Stephen; Yu, Jing; Humann, Jodi; Main, Dorrie
2017-01-01
Abstract Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/
Novel genetic tools for studying food-borne Salmonella.
Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael
2009-04-01
Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.
FLP recombinase-mediated site-specific recombination in silkworm, Bombyx mori
USDA-ARS?s Scientific Manuscript database
A comprehensive understanding of gene function and the production of site-specific genetically modified mutants are two major goals of genetic engineering in the post-genomic era. Although site-specific recombination systems have been powerful tools for genome manipulation of many organisms, they h...
Plant genome and transcriptome annotations: from misconceptions to simple solutions
Bolger, Marie E; Arsova, Borjana; Usadel, Björn
2018-01-01
Abstract Next-generation sequencing has triggered an explosion of available genomic and transcriptomic resources in the plant sciences. Although genome and transcriptome sequencing has become orders of magnitudes cheaper and more efficient, often the functional annotation process is lagging behind. This might be hampered by the lack of a comprehensive enumeration of simple-to-use tools available to the plant researcher. In this comprehensive review, we present (i) typical ontologies to be used in the plant sciences, (ii) useful databases and resources used for functional annotation, (iii) what to expect from an annotated plant genome, (iv) an automated annotation pipeline and (v) a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources. PMID:28062412
CRISPR-Cas9: from Genome Editing to Cancer Research
Chen, Si; Sun, Heng; Miao, Kai; Deng, Chu-Xia
2016-01-01
Cancer development is a multistep process triggered by innate and acquired mutations, which cause the functional abnormality and determine the initiation and progression of tumorigenesis. Gene editing is a widely used engineering tool for generating mutations that enhance tumorigenesis. The recent developed clustered regularly interspaced short palindromic repeats-CRISPR-associated 9 (CRISPR-Cas9) system renews the genome editing approach into a more convenient and efficient way. By rapidly introducing genetic modifications in cell lines, organs and animals, CRISPR-Cas9 system extends the gene editing into whole genome screening, both in loss-of-function and gain-of-function manners. Meanwhile, the system accelerates the establishment of animal cancer models, promoting in vivo studies for cancer research. Furthermore, CRISPR-Cas9 system is modified into diverse innovative tools for observing the dynamic bioprocesses in cancer studies, such as image tracing for targeted DNA, regulation of transcription activation or repression. Here, we view recent technical advances in the application of CRISPR-Cas9 system in cancer genetics, large-scale cancer driver gene hunting, animal cancer modeling and functional studies. PMID:27994508
Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.
Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul
2016-07-01
Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.
Function-selective domain architecture plasticity potentials in eukaryotic genome evolution
Linkeviciute, Viktorija; Rackham, Owen J.L.; Gough, Julian; Oates, Matt E.; Fang, Hai
2015-01-01
To help evaluate how protein function impacts on genome evolution, we introduce a new concept of ‘architecture plasticity potential’ – the capacity to form distinct domain architectures – both for an individual domain, or more generally for a set of domains grouped by shared function. We devise a scoring metric to measure the plasticity potential for these domain sets, and evaluate how function has changed over time for different species. Applying this metric to a phylogenetic tree of eukaryotic genomes, we find that the involvement of each function is not random but highly selective. For certain lineages there is strong bias for evolution to involve domains related to certain functions. In general eukaryotic genomes, particularly animals, expand complex functional activities such as signalling and regulation, but at the cost of reducing metabolic processes. We also observe differential evolution of transcriptional regulation and a unique evolutionary role of channel regulators; crucially this is only observable in terms of the architecture plasticity potential. Our findings provide a new layer of information to understand the significance of function in eukaryotic genome evolution. A web search tool, available at http://supfam.org/Pevo, offers a wide spectrum of options for exploring functional importance in eukaryotic genome evolution. PMID:25980317
[CRISPR/CAS9, the King of Genome Editing Tools].
Bannikov, A V; Lavrov, A V
2017-01-01
The discovery of CRISPR/Cas9 brought a hope for having an efficient, reliable, and readily available tool for genome editing. CRISPR/Cas9 is certainly easy to use, while its efficiency and reliability remain the focus of studies. The review describes the general principles of the organization and function of Cas nucleases and a number of important issues to be considered while planning genome editing experiments with CRISPR/Cas9. The issues include evaluation of the efficiency and specificity for Cas9, sgRNA selection, Cas9 variants designed artificially, and use of homologous recombination and nonhomologous end joining in DNA editing.
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.
Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A
2017-11-28
Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
Sawkins, M C; Farmer, A D; Hoisington, D; Sullivan, J; Tolopko, A; Jiang, Z; Ribaut, J-M
2004-10-01
In the past few decades, a wealth of genomic data has been produced in a wide variety of species using a diverse array of functional and molecular marker approaches. In order to unlock the full potential of the information contained in these independent experiments, researchers need efficient and intuitive means to identify common genomic regions and genes involved in the expression of target phenotypic traits across diverse conditions. To address this need, we have developed a Comparative Map and Trait Viewer (CMTV) tool that can be used to construct dynamic aggregations of a variety of types of genomic datasets. By algorithmically determining correspondences between sets of objects on multiple genomic maps, the CMTV can display syntenic regions across taxa, combine maps from separate experiments into a consensus map, or project data from different maps into a common coordinate framework using dynamic coordinate translations between source and target maps. We present a case study that illustrates the utility of the tool for managing large and varied datasets by integrating data collected by CIMMYT in maize drought tolerance research with data from public sources. This example will focus on one of the visualization features for Quantitative Trait Locus (QTL) data, using likelihood ratio (LR) files produced by generic QTL analysis software and displaying the data in a unique visual manner across different combinations of traits, environments and crosses. Once a genomic region of interest has been identified, the CMTV can search and display additional QTLs meeting a particular threshold for that region, or other functional data such as sets of differentially expressed genes located in the region; it thus provides an easily used means for organizing and manipulating data sets that have been dynamically integrated under the focus of the researcher's specific hypothesis.
Unlocking Triticeae genomics to sustainably feed the future
Mochida, Keiichi; Shinozaki, Kazuo
2013-01-01
The tribe Triticeae includes the major crops wheat and barley. Within the last few years, the whole genomes of four Triticeae species—barley, wheat, Tausch’s goatgrass (Aegilops tauschii) and wild einkorn wheat (Triticum urartu)—have been sequenced. The availability of these genomic resources for Triticeae plants and innovative analytical applications using next-generation sequencing technologies are helping to revitalize our approaches in genetic work and to accelerate improvement of the Triticeae crops. Comparative genomics and integration of genomic resources from Triticeae plants and the model grass Brachypodium distachyon are aiding the discovery of new genes and functional analyses of genes in Triticeae crops. Innovative approaches and tools such as analysis of next-generation populations, evolutionary genomics and systems approaches with mathematical modeling are new strategies that will help us discover alleles for adaptive traits to future agronomic environments. In this review, we provide an update on genomic tools for use with Triticeae plants and Brachypodium and describe emerging approaches toward crop improvements in Triticeae. PMID:24204022
Blast2GO goes grid: developing a grid-enabled prototype for functional genomics analysis.
Aparicio, G; Götz, S; Conesa, A; Segrelles, D; Blanquer, I; García, J M; Hernandez, V; Robles, M; Talon, M
2006-01-01
The vast amount in complexity of data generated in Genomic Research implies that new dedicated and powerful computational tools need to be developed to meet their analysis requirements. Blast2GO (B2G) is a bioinformatics tool for Gene Ontology-based DNA or protein sequence annotation and function-based data mining. The application has been developed with the aim of affering an easy-to-use tool for functional genomics research. Typical B2G users are middle size genomics labs carrying out sequencing, ETS and microarray projects, handling datasets up to several thousand sequences. In the current version of B2G. The power and analytical potential of both annotation and function data-mining is somehow restricted to the computational power behind each particular installation. In order to be able to offer the possibility of an enhanced computational capacity within this bioinformatics application, a Grid component is being developed. A prototype has been conceived for the particular problem of speeding up the Blast searches to obtain fast results for large datasets. Many efforts have been done in the literature concerning the speeding up of Blast searches, but few of them deal with the use of large heterogeneous production Grid Infrastructures. These are the infrastructures that could reach the largest number of resources and the best load balancing for data access. The Grid Service under development will analyse requests based on the number of sequences, splitting them accordingly to the available resources. Lower-level computation will be performed through MPIBLAST. The software architecture is based on the WSRF standard.
USDA-ARS?s Scientific Manuscript database
Among various genome editing tools available for functional genomic studies, reagents based on clustered regularly interspersed palindromic repeats (CRISPR) have gained popularity due to ease and versatility. CRISPR reagents consists of ribonucleoprotein (RNP) complexes formed by combining guide RNA...
Genome wide association studies on yield components using a lentil genetic diversity panel
USDA-ARS?s Scientific Manuscript database
The cool season food legume research community are now at the threshold of deploying the cutting-edge molecular genetics and genomics tools that have led to significant and rapid expansion of gene discovery, knowledge of gene function (including tolerance to biotic and abiotic stresses) and genetic ...
Advances and perspectives on the use of CRISPR/Cas9 systems in plant genomics research
Liu, Degao; Hu, Rongbin; Palla, Kaitlin J.; ...
2016-02-18
Genome editing with site-specific nucleases has become a powerful tool for functional characterization of plant genes and genetic improvement of agricultural crops. Among the various site-specific nuclease-based technologies available for genome editing, the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems have shown the greatest potential for rapid and efficient editing of genomes in plant species. Here, this article reviews the current status of application of CRISPR/Cas9 to plant genomics research, with a focus on loss-of-function and gain-of-function analysis of individual genes in the context of perennial plants and the potential application of CRISPR/Cas9 to perturbation ofmore » gene expression, as well as identification and analysis of gene modules as part of an accelerated domestication and synthetic biology effort.« less
Advances and perspectives on the use of CRISPR/Cas9 systems in plant genomics research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Degao; Hu, Rongbin; Palla, Kaitlin J.
Genome editing with site-specific nucleases has become a powerful tool for functional characterization of plant genes and genetic improvement of agricultural crops. Among the various site-specific nuclease-based technologies available for genome editing, the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems have shown the greatest potential for rapid and efficient editing of genomes in plant species. Here, this article reviews the current status of application of CRISPR/Cas9 to plant genomics research, with a focus on loss-of-function and gain-of-function analysis of individual genes in the context of perennial plants and the potential application of CRISPR/Cas9 to perturbation ofmore » gene expression, as well as identification and analysis of gene modules as part of an accelerated domestication and synthetic biology effort.« less
AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae
Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael
2015-01-01
The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462
Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine
2013-01-01
MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269
ArrayExpress update--trends in database growth and links to data analysis tools.
Rustici, Gabriella; Kolesnikov, Nikolay; Brandizi, Marco; Burdett, Tony; Dylag, Miroslaw; Emam, Ibrahim; Farne, Anna; Hastings, Emma; Ison, Jon; Keays, Maria; Kurbatova, Natalja; Malone, James; Mani, Roby; Mupo, Annalisa; Pedro Pereira, Rui; Pilicheva, Ekaterina; Rung, Johan; Sharma, Anjan; Tang, Y Amy; Ternent, Tobias; Tikhonov, Andrew; Welter, Danielle; Williams, Eleanor; Brazma, Alvis; Parkinson, Helen; Sarkans, Ugis
2013-01-01
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
Biopathways representation and simulation on hybrid functional petri net.
Matsuno, Hiroshi; Tanaka, Yukiko; Aoshima, Hitoshi; Doi, Atsushi; Matsui, Mika; Miyano, Satoru
2011-01-01
The following two matters should be resolved in order for biosimulation tools to be accepted by users in biology/medicine: (1) remove issues which are irrelevant to biological importance, and (2) allow users to represent biopathways intuitively and understand/manage easily the details of representation and simulation mechanism. From these criteria, we firstly define a novel notion of Petri net called Hybrid Functional Petri Net (HFPN). Then, we introduce a software tool, Genomic Object Net, for representing and simulating biopathways, which we have developed by employing the architecture of HFPN. In order to show the usefulness of Genomic Object Net for representing and simulating biopathways, we show two HFPN representations of gene regulation mechanisms of Drosophila melanogaster (fruit fly) circadian rhythm and apoptosis induced by Fas ligand. The simulation results of these biopathways are also correlated with biological observations. The software is available to academic users from http://www.GenomicObject.Net/.
A Parvovirus B19 synthetic genome: sequence features and functional competence.
Manaresi, Elisabetta; Conti, Ilaria; Bua, Gloria; Bonvicini, Francesca; Gallinella, Giorgio
2017-08-01
Central to genetic studies for Parvovirus B19 (B19V) is the availability of genomic clones that may possess functional competence and ability to generate infectious virus. In our study, we established a new model genetic system for Parvovirus B19. A synthetic approach was followed, by design of a reference genome sequence, by generation of a corresponding artificial construct and its molecular cloning in a complete and functional form, and by setup of an efficient strategy to generate infectious virus, via transfection in UT7/EpoS1 cells and amplification in erythroid progenitor cells. The synthetic genome was able to generate virus with biological properties paralleling those of native virus, its infectious activity being dependent on the preservation of self-complementarity and sequence heterogeneity within the terminal regions. A virus of defined genome sequence, obtained from controlled cell culture conditions, can constitute a reference tool for investigation of the structural and functional characteristics of the virus. Copyright © 2017 Elsevier Inc. All rights reserved.
Cui, Miao; Lin, Che-Yi; Su, Yi-Hsien
2017-09-01
Studies on the gene regulatory networks (GRNs) of sea urchin embryos have provided a basic understanding of the molecular mechanisms controlling animal development. The causal links in GRNs have been verified experimentally through perturbation of gene functions. Microinjection of antisense morpholino oligonucleotides (MOs) into the egg is the most widely used approach for gene knockdown in sea urchin embryos. The modification of MOs into a membrane-permeable form (vivo-MOs) has allowed gene knockdown at later developmental stages. Recent advances in genome editing tools, such as zinc-finger nucleases, transcription activator-like effector-based nucleases and the clustered regularly interspaced short palindromic repeat/clustered regularly interspaced short palindromic repeat-associated protein 9 (CRISPR/Cas9) system, have provided methods for gene knockout in sea urchins. Here, we review the use of vivo-MOs and genome editing tools in sea urchin studies since the publication of its genome in 2006. Various applications of the CRISPR/Cas9 system and their potential in studying sea urchin development are also discussed. These new tools will provide more sophisticated experimental methods for studying sea urchin development. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Operon-mapper: A Web Server for Precise Operon Identification in Bacterial and Archaeal Genomes.
Taboada, Blanca; Estrada, Karel; Ciria, Ricardo; Merino, Enrique
2018-06-19
Operon-mapper is a web server that accurately, easily, and directly predicts the operons of any bacterial or archaeal genome sequence. The operon predictions are based on the intergenic distance of neighboring genes as well as the functional relationships of their protein-coding products. To this end, Operon-mapper finds all the ORFs within a given nucleotide sequence, along with their genomic coordinates, orthology groups, and functional relationships. We believe that Operon-mapper, due to its accuracy, simplicity and speed, as well as the relevant information that it generates, will be a useful tool for annotating and characterizing genomic sequences. http://biocomputo.ibt.unam.mx/operon_mapper/.
Genomics of Secondary Metabolism in Populus: Interactions with Biotic and Abiotic Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, F.; Liu, C.; Tschaplinski, T. J.
2009-09-01
Populus trees face constant challenges from the environment during their life cycle. To ensure their survival and reproduction, Populus trees deploy various types of defenses, one of which is the production of a myriad of secondary metabolites. Compounds derived from the shikimate-phenylpropanoid pathway are the most abundant class of secondary metabolites synthesized in Populus. Among other major classes of secondary metabolites in Populus are terpenoids and fatty acid-derivatives. Some of the secondary metabolites made by Populus trees have been functionally characterized. Some others have been associated with certain biological/ecological processes, such as defense against insects and microbial pathogens or acclimationmore » or adaptation to abiotic stresses. Functions of many Populus secondary metabolites remain unclear. The advent of various novel genomic tools will enable us to explore in greater detail the complexity of secondary metabolism in Populus. Detailed data mining of the Populus genome sequence can unveil candidate genes of secondary metabolism. Metabolomic analysis will continue to identify new metabolites synthesized in Populus. Integrated genomics that combines various 'omics' tools will prove to be the most powerful approach in revealing the molecular and biochemical basis underlying the biosynthesis of secondary metabolites in Populus. Characterization of the biological/ecological functions of secondary metabolites as well as their biosynthesis will provide knowledge and tools for genetically engineering the production of seconday metabolites that can lead to the generation of novel, improved Populus varieties.« less
Häcker, Irina; Harrell Ii, Robert A; Eichner, Gerrit; Pilitt, Kristina L; O'Brochta, David A; Handler, Alfred M; Schetelig, Marc F
2017-03-07
Site-specific genome modification (SSM) is an important tool for mosquito functional genomics and comparative gene expression studies, which contribute to a better understanding of mosquito biology and are thus a key to finding new strategies to eliminate vector-borne diseases. Moreover, it allows for the creation of advanced transgenic strains for vector control programs. SSM circumvents the drawbacks of transposon-mediated transgenesis, where random transgene integration into the host genome results in insertional mutagenesis and variable position effects. We applied the Cre/lox recombinase-mediated cassette exchange (RMCE) system to Aedes aegypti, the vector of dengue, chikungunya, and Zika viruses. In this context we created four target site lines for RMCE and evaluated their fitness costs. Cre-RMCE is functional in a two-step mechanism and with good efficiency in Ae. aegypti. The advantages of Cre-RMCE over existing site-specific modification systems for Ae. aegypti, phiC31-RMCE and CRISPR, originate in the preservation of the recombination sites, which 1) allows successive modifications and rapid expansion or adaptation of existing systems by repeated targeting of the same site; and 2) provides reversibility, thus allowing the excision of undesired sequences. Thereby, Cre-RMCE complements existing genomic modification tools, adding flexibility and versatility to vector genome targeting.
A survey of tools for variant analysis of next-generation genome sequencing data
Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes
2014-01-01
Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494
Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola
2011-05-01
Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Nepolean, Thirunavukkarsau; Kaul, Jyoti; Mukri, Ganapati; Mittal, Shikha
2018-01-01
Breeding science has immensely contributed to the global food security. Several varieties and hybrids in different food crops including maize have been released through conventional breeding. The ever growing population, decreasing agricultural land, lowering water table, changing climate, and other variables pose tremendous challenge to the researchers to improve the production and productivity of food crops. Drought is one of the major problems to sustain and improve the productivity of food crops including maize in tropical and subtropical production systems. With advent of novel genomics and breeding tools, the way of doing breeding has been tremendously changed in the last two decades. Drought tolerance is a combination of several component traits with a quantitative mode of inheritance. Rapid DNA and RNA sequencing tools and high-throughput SNP genotyping techniques, trait mapping, functional characterization, genomic selection, rapid generation advancement, and other tools are now available to understand the genetics of drought tolerance and to accelerate the breeding cycle. Informatics play complementary role by managing the big-data generated from the large-scale genomics and breeding experiments. Genome editing is the latest technique to alter specific genes to improve the trait expression. Integration of novel genomics, next-generation breeding, and informatics tools will accelerate the stress breeding process and increase the genetic gain under different production systems. PMID:29696027
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
Purcell, Shaun ; Neale, Benjamin ; Todd-Brown, Kathe ; Thomas, Lori ; Ferreira, Manuel A. R. ; Bender, David ; Maller, Julian ; Sklar, Pamela ; de Bakker, Paul I. W. ; Daly, Mark J. ; Sham, Pak C.
2007-01-01
Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis. PMID:17701901
Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M
2012-04-05
The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
2012-01-01
Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257
An automated system for evaluation of the potential functionome: MAPLE version 2.1.0
Takami, Hideto; Taniguchi, Takeaki; Arai, Wataru; Takemoto, Kazuhiro; Moriya, Yuki; Goto, Susumu
2016-01-01
Metabolic and physiological potential evaluator (MAPLE) is an automatic system that can perform a series of steps used in the evaluation of potential comprehensive functions (functionome) harboured in the genome and metagenome. MAPLE first assigns KEGG Orthology (KO) to the query gene, maps the KO-assigned genes to the Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules, and then calculates the module completion ratio (MCR) of each functional module to characterize the potential functionome in the user’s own genomic and metagenomic data. In this study, we added two more useful functions to calculate module abundance and Q-value, which indicate the functional abundance and statistical significance of the MCR results, respectively, to the new version of MAPLE for more detailed comparative genomic and metagenomic analyses. Consequently, MAPLE version 2.1.0 reported significant differences in the potential functionome, functional abundance, and diversity of contributors to each function among four metagenomic datasets generated by the global ocean sampling expedition, one of the most popular environmental samples to use with this system. MAPLE version 2.1.0 is now available through the web interface (http://www.genome.jp/tools/maple/) 17 June 2016, date last accessed. PMID:27374611
SNPit: a federated data integration system for the purpose of functional SNP annotation.
Shen, Terry H; Carlson, Christopher S; Tarczy-Hornoch, Peter
2009-08-01
Genome wide association studies can potentially identify the genetic causes behind the majority of human diseases. With the advent of more advanced genotyping techniques, there is now an explosion of data gathered on single nucleotide polymorphisms (SNPs). The need exists for an integrated system that can provide up-to-date functional annotation information on SNPs. We have developed the SNP Integration Tool (SNPit) system to address this need. Built upon a federated data integration system, SNPit provides current information on a comprehensive list of SNP data sources. Additional logical inference analysis was included through an inference engine plug in. The SNPit web servlet is available online for use. SNPit allows users to go to one source for up-to-date information on the functional annotation of SNPs. A tool that can help to integrate and analyze the potential functional significance of SNPs is important for understanding the results from genome wide association studies.
[Advances in CRISPR-Cas-mediated genome editing system in plants].
Wang, Chun; Wang, Kejian
2017-10-25
Targeted genome editing technology is an important tool to study the function of genes and to modify organisms at the genetic level. Recently, CRISPR-Cas (clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins) system has emerged as an efficient tool for specific genome editing in animals and plants. CRISPR-Cas system uses CRISPR-associated endonuclease and a guide RNA to generate double-strand breaks at the target DNA site, subsequently leading to genetic modifications. CRISPR-Cas system has received widespread attention for manipulating the genomes with simple, easy and high specificity. This review summarizes recent advances of diverse applications of the CRISPR-Cas toolkit in plant research and crop breeding, including expanding the range of genome editing, precise editing of a target base, and efficient DNA-free genome editing technology. This review also discusses the potential challenges and application prospect in the future, and provides a useful reference for researchers who are interested in this field.
Hériché, Jean-Karim; Lees, Jon G.; Morilla, Ian; Walter, Thomas; Petrova, Boryana; Roberti, M. Julia; Hossain, M. Julius; Adler, Priit; Fernández, José M.; Krallinger, Martin; Haering, Christian H.; Vilo, Jaak; Valencia, Alfonso; Ranea, Juan A.; Orengo, Christine; Ellenberg, Jan
2014-01-01
The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest. PMID:24943848
PanACEA: a bioinformatics tool for the exploration and visualization of bacterial pan-chromosomes.
Clarke, Thomas H; Brinkac, Lauren M; Inman, Jason M; Sutton, Granger; Fouts, Derrick E
2018-06-27
Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to locate and analyze these regions. Multiple software packages are available to visualize pan-genomes, but currently their ability to address these concerns are limited by using only pre-computed data sets, prioritizing core over variable gene clusters, or by not accounting for pan-chromosome positioning in the viewer. We introduce PanACEA (Pan-genome Atlas with Chromosome Explorer and Analyzer), which utilizes locally-computed interactive web-pages to view ordered pan-genome data. It consists of multi-tiered, hierarchical display pages that extend from pan-chromosomes to both core and variable regions to single genes. Regions and genes are functionally annotated to allow for rapid searching and visual identification of regions of interest with the option that user-supplied genomic phylogenies and metadata can be incorporated. PanACEA's memory and time requirements are within the capacities of standard laptops. The capability of PanACEA as a research tool is demonstrated by highlighting a variable region important in differentiating strains of Enterobacter hormaechei. PanACEA can rapidly translate the results of pan-chromosome programs into an intuitive and interactive visual representation. It will empower researchers to visually explore and identify regions of the pan-chromosome that are most biologically interesting, and to obtain publication quality images of these regions.
GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface.
Lajugie, Julien; Fourel, Nicolas; Bouhassira, Eric E
2015-01-01
Parallel visualization of multiple individual human genomes is a complex endeavor that is rapidly gaining importance with the increasing number of personal, phased and cancer genomes that are being generated. It requires the display of variants such as SNPs, indels and structural variants that are unique to specific genomes and the introduction of multiple overlapping gaps in the reference sequence. Here, we describe GenPlay Multi-Genome, an application specifically written to visualize and analyze multiple human genomes in parallel. GenPlay Multi-Genome is ideally suited for the comparison of allele-specific expression and functional genomic data obtained from multiple phased genomes in a graphical interface with access to multiple-track operation. It also allows the analysis of data that have been aligned to custom genomes rather than to a standard reference and can be used as a variant calling format file browser and as a tool to compare different genome assembly, such as hg19 and hg38. GenPlay is available under the GNU public license (GPL-3) from http://genplay.einstein.yu.edu. The source code is available at https://github.com/JulienLajugie/GenPlay. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Demir, E; Babur, O; Dogrusoz, U; Gursoy, A; Nisanci, G; Cetin-Atalay, R; Ozturk, M
2002-07-01
Availability of the sequences of entire genomes shifts the scientific curiosity towards the identification of function of the genomes in large scale as in genome studies. In the near future, data produced about cellular processes at molecular level will accumulate with an accelerating rate as a result of proteomics studies. In this regard, it is essential to develop tools for storing, integrating, accessing, and analyzing this data effectively. We define an ontology for a comprehensive representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information and supports manipulation and incorporation of the stored data, as well as multiple levels of abstraction. Based on this ontology, we present the architecture of an integrated environment named Patika (Pathway Analysis Tool for Integration and Knowledge Acquisition). Patika is composed of a server-side, scalable, object-oriented database and client-side editors to provide an integrated, multi-user environment for visualizing and manipulating network of cellular events. This tool features automated pathway layout, functional computation support, advanced querying and a user-friendly graphical interface. We expect that Patika will be a valuable tool for rapid knowledge acquisition, microarray generated large-scale data interpretation, disease gene identification, and drug development. A prototype of Patika is available upon request from the authors.
Prunus transcription factors: breeding perspectives
Bianchi, Valmor J.; Rubio, Manuel; Trainotti, Livio; Verde, Ignazio; Bonghi, Claudio; Martínez-Gómez, Pedro
2015-01-01
Many plant processes depend on differential gene expression, which is generally controlled by complex proteins called transcription factors (TFs). In peach, 1533 TFs have been identified, accounting for about 5.5% of the 27,852 protein-coding genes. These TFs are the reference for the rest of the Prunus species. TF studies in Prunus have been performed on the gene expression analysis of different agronomic traits, including control of the flowering process, fruit quality, and biotic and abiotic stress resistance. These studies, using quantitative RT-PCR, have mainly been performed in peach, and to a lesser extent in other species, including almond, apricot, black cherry, Fuji cherry, Japanese apricot, plum, and sour and sweet cherry. Other tools have also been used in TF studies, including cDNA-AFLP, LC-ESI-MS, RNA, and DNA blotting or mapping. More recently, new tools assayed include microarray and high-throughput DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq). New functional genomics opportunities include genome resequencing and the well-known synteny among Prunus genomes and transcriptomes. These new functional studies should be applied in breeding programs in the development of molecular markers. With the genome sequences available, some strategies that have been used in model systems (such as SNP genotyping assays and genotyping-by-sequencing) may be applicable in the functional analysis of Prunus TFs as well. In addition, the knowledge of the gene functions and position in the peach reference genome of the TFs represents an additional advantage. These facts could greatly facilitate the isolation of genes via QTL (quantitative trait loci) map-based cloning in the different Prunus species, following the association of these TFs with the identified QTLs using the peach reference genome. PMID:26124770
2014-01-01
Background Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs. Description It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality. Conclusions cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable hypotheses for the molecular mechanisms of metazoan gene regulation. We believe that the information deposited in cisMEP will greatly facilitate the comparative usage of different CRM prediction tools and will help biologists to study the modular regulatory mechanisms between different TFs and their target genes. PMID:25521507
Web Apollo: a web-based genomic annotation editing platform.
Lee, Eduardo; Helt, Gregg A; Reese, Justin T; Munoz-Torres, Monica C; Childers, Chris P; Buels, Robert M; Stein, Lincoln; Holmes, Ian H; Elsik, Christine G; Lewis, Suzanna E
2013-08-30
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.
Web Apollo: a web-based genomic annotation editing platform
2013-01-01
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world. PMID:24000942
NASA Astrophysics Data System (ADS)
Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf
2002-04-01
Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.
Contemporary molecular tools in microbial ecology and their application to advancing biotechnology.
Rashid, Mamoon; Stingl, Ulrich
2015-12-01
Novel methods in microbial ecology are revolutionizing our understanding of the structure and function of microbes in the environment, but concomitant advances in applications of these tools to biotechnology are mostly lagging behind. After more than a century of efforts to improve microbial culturing techniques, about 70-80% of microbial diversity - recently called the "microbial dark matter" - remains uncultured. In early attempts to identify and sample these so far uncultured taxonomic lineages, methods that amplify and sequence ribosomal RNA genes were extensively used. Recent developments in cell separation techniques, DNA amplification, and high-throughput DNA sequencing platforms have now made the discovery of genes/genomes of uncultured microorganisms from different environments possible through the use of metagenomic techniques and single-cell genomics. When used synergistically, these metagenomic and single-cell techniques create a powerful tool to study microbial diversity. These genomics techniques have already been successfully exploited to identify sources for i) novel enzymes or natural products for biotechnology applications, ii) novel genes from extremophiles, and iii) whole genomes or operons from uncultured microbes. More can be done to utilize these tools more efficiently in biotechnology. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk
2014-10-09
To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less
TabSQL: a MySQL tool to facilitate mapping user data to public databases.
Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng
2010-06-23
With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.
TabSQL: a MySQL tool to facilitate mapping user data to public databases
2010-01-01
Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251
PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.
Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X
2017-01-01
Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use. PMID:28386247
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use.
Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun
2006-08-25
We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
Ascribing Functions to Genes: Journey Towards Genetic Improvement of Rice Via Functional Genomics
Mustafiz, Ananda; Kumari, Sumita; Karan, Ratna
2016-01-01
Rice, one of the most important cereal crops for mankind, feeds more than half the world population. Rice has been heralded as a model cereal owing to its small genome size, amenability to easy transformation, high synteny to other cereal crops and availability of complete genome sequence. Moreover, sequence wealth in rice is getting more refined and precise due to resequencing efforts. This humungous resource of sequence data has confronted research fraternity with a herculean challenge as well as an excellent opportunity to functionally validate expressed as well as regulatory portions of the genome. This will not only help us in understanding the genetic basis of plant architecture and physiology but would also steer us towards developing improved cultivars. No single technique can achieve such a mammoth task. Functional genomics through its diverse tools viz. loss and gain of function mutants, multifarious omics strategies like transcriptomics, proteomics, metabolomics and phenomics provide us with the necessary handle. A paradigm shift in technological advances in functional genomics strategies has been instrumental in generating considerable amount of information w.r.t functionality of rice genome. We now have several databases and online resources for functionally validated genes but despite that we are far from reaching the desired milestone of functionally characterizing each and every rice gene. There is an urgent need for a common platform, for information already available in rice, and collaborative efforts between researchers in a concerted manner as well as healthy public-private partnership, for genetic improvement of rice crop better able to handle the pressures of climate change and exponentially increasing population. PMID:27252584
Traini, Alessandra; Iorizzo, Massimo; Mann, Harpartap; Bradeen, James M; Carputo, Domenico; Frusciante, Luigi; Chiusano, Maria Luisa
2013-01-01
Tuber-bearing potato species possess several genes that can be exploited to improve the genetic background of the cultivated potato Solanum tuberosum. Among them, S. bulbocastanum and S. commersonii are well known for their strong resistance to environmental stresses. However, scant information is available for these species in terms of genome organization, gene function, and regulatory networks. Consequently, genomic tools to assist breeding are meager, and efficient exploitation of these species has been limited so far. In this paper, we employed the reference genome sequences from cultivated potato and tomato and a collection of sequences of 1,423 potato Diversity Arrays Technology (DArT) markers that show polymorphic representation across the genomes of S. bulbocastanum and/or S. commersonii genotypes. Our results highlighted microscale genome sequence heterogeneity that may play a significant role in functional and structural divergence between related species. Our analytical approach provides knowledge of genome structural and sequence variability that could not be detected by transcriptome and proteome approaches.
The Future of CRISPR Applications in the Lab, the Clinic and Society.
Hough, Soren H; Ajetunmobi, Ayokunmi
2017-01-01
CRISPR (clustered regularly interspaced short palindromic repeats) has emerged as one of the premiere biological tools of the century. Even more so than older genome editing techniques such as TALENs and ZFNs, CRISPR provides speed and ease-of-use heretofore unheard of in agriculture, the environment and human health. The ability to map the function of virtually every component of the genome in a scalable, multiplexed manner is unprecedented. Once those regions have been explored, CRISPR also presents an opportunity to take advantage of endogenous cellular repair pathways to change and precisely edit the genome [1-3]. In the case of human health, CRISPR operates as both a tool of discovery and a solution to fundamental problems behind disease and undesirable mutations.
An integrative approach to ortholog prediction for disease-focused and other functional studies.
Hu, Yanhui; Flockhart, Ian; Vinayagam, Arunachalam; Bergwitz, Clemens; Berger, Bonnie; Perrimon, Norbert; Mohr, Stephanie E
2011-08-31
Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
MPD: a pathogen genome and metagenome database
Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen
2018-01-01
Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040
Lehmann, Jason S.; Matthias, Michael A.; Vinetz, Joseph M.; Fouts, Derrick E.
2014-01-01
Leptospirosis, caused by pathogenic spirochetes belonging to the genus Leptospira, is a zoonosis with important impacts on human and animal health worldwide. Research on the mechanisms of Leptospira pathogenesis has been hindered due to slow growth of infectious strains, poor transformability, and a paucity of genetic tools. As a result of second generation sequencing technologies, there has been an acceleration of leptospiral genome sequencing efforts in the past decade, which has enabled a concomitant increase in functional genomics analyses of Leptospira pathogenesis. A pathogenomics approach, by coupling of pan-genomic analysis of multiple isolates with sequencing of experimentally attenuated highly pathogenic Leptospira, has resulted in the functional inference of virulence factors. The global Leptospira Genome Project supported by the U.S. National Institute of Allergy and Infectious Diseases to which key scientific contributions have been made from the international leptospirosis research community has provided a new roadmap for comprehensive studies of Leptospira and leptospirosis well into the future. This review describes functional genomics approaches to apply the data generated by the Leptospira Genome Project towards deepening our knowledge of virulence factors of Leptospira using the emerging discipline of pathogenomics. PMID:25437801
The Chinchilla Research Resource Database: resource for an otolaryngology disease model
Shimoyama, Mary; Smith, Jennifer R.; De Pons, Jeff; Tutaj, Marek; Khampang, Pawjai; Hong, Wenzhou; Erbe, Christy B.; Ehrlich, Garth D.; Bakaletz, Lauren O.; Kerschner, Joseph E.
2016-01-01
The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human diseases prompted the sequencing of its genome in 2012 and the more recent development of the Chinchilla Research Resource Database (http://crrd.mcw.edu) to provide investigators with easy access to relevant datasets and software tools to enhance their research. The Chinchilla Research Resource Database contains a complete catalog of genes for chinchilla and, for comparative purposes, human. Chinchilla genes can be viewed in the context of their genomic scaffold positions using the JBrowse genome browser. In contrast to the corresponding records at NCBI, individual gene reports at CRRD include functional annotations for Disease, Gene Ontology (GO) Biological Process, GO Molecular Function, GO Cellular Component and Pathway assigned to chinchilla genes based on annotations from the corresponding human orthologs. Data can be retrieved via keyword and gene-specific searches. Lists of genes with similar functional attributes can be assembled by leveraging the hierarchical structure of the Disease, GO and Pathway vocabularies through the Ontology Search and Browser tool. Such lists can then be further analyzed for commonalities using the Gene Annotator (GA) Tool. All data in the Chinchilla Research Resource Database is freely accessible and downloadable via the CRRD FTP site or using the download functions available in the search and analysis tools. The Chinchilla Research Resource Database is a rich resource for researchers using, or considering the use of, chinchilla as a model for human disease. Database URL: http://crrd.mcw.edu PMID:27173523
Interactome INSIDER: a structural interactome browser for genomic studies.
Meyer, Michael J; Beltrán, Juan Felipe; Liang, Siqi; Fragoza, Robert; Rumack, Aaron; Liang, Jin; Wei, Xiaomu; Yu, Haiyuan
2018-01-01
We present Interactome INSIDER, a tool to link genomic variant information with structural protein-protein interactomes. Underlying this tool is the application of machine learning to predict protein interaction interfaces for 185,957 protein interactions with previously unresolved interfaces in human and seven model organisms, including the entire experimentally determined human binary interactome. Predicted interfaces exhibit functional properties similar to those of known interfaces, including enrichment for disease mutations and recurrent cancer mutations. Through 2,164 de novo mutagenesis experiments, we show that mutations of predicted and known interface residues disrupt interactions at a similar rate and much more frequently than mutations outside of predicted interfaces. To spur functional genomic studies, Interactome INSIDER (http://interactomeinsider.yulab.org) enables users to identify whether variants or disease mutations are enriched in known and predicted interaction interfaces at various resolutions. Users may explore known population variants, disease mutations, and somatic cancer mutations, or they may upload their own set of mutations for this purpose.
Information management systems for pharmacogenomics.
Thallinger, Gerhard G; Trajanoski, Slave; Stocker, Gernot; Trajanoski, Zlatko
2002-09-01
The value of high-throughput genomic research is dramatically enhanced by association with key patient data. These data are generally available but of disparate quality and not typically directly associated. A system that could bring these disparate data sources into a common resource connected with functional genomic data would be tremendously advantageous. However, the integration of clinical and accurate interpretation of the generated functional genomic data requires the development of information management systems capable of effectively capturing the data as well as tools to make that data accessible to the laboratory scientist or to the clinician. In this review these challenges and current information technology solutions associated with the management, storage and analysis of high-throughput data are highlighted. It is suggested that the development of a pharmacogenomic data management system which integrates public and proprietary databases, clinical datasets, and data mining tools embedded in a high-performance computing environment should include the following components: parallel processing systems, storage technologies, network technologies, databases and database management systems (DBMS), and application services.
Jian, Bo; Hou, Wensheng; Wu, Cunxiang; Liu, Bin; Liu, Wei; Song, Shikui; Bi, Yurong; Han, Tianfu
2009-06-25
Transgenic approaches provide a powerful tool for gene function investigations in plants. However, some legumes are still recalcitrant to current transformation technologies, limiting the extent to which functional genomic studies can be performed on. Superroot of Lotus corniculatus is a continuous root cloning system allowing direct somatic embryogenesis and mass regeneration of plants. Recently, a technique to obtain transgenic L. corniculatus plants from Superroot-derived leaves through A. tumefaciens-mediated transformation was described. However, transformation efficiency was low and it took about six months from gene transfer to PCR identification. In the present study, we developed an A. rhizogenes-mediated transformation of Superroot-derived L. corniculatus for gene function investigation, combining the efficient A. rhizogenes-mediated transformation and the rapid regeneration system of Superroot. The transformation system using A. rhizogenes K599 harbouring pGFPGUSPlus was improved by validating some parameters which may influence the transformation frequency. Using stem sections with one node as explants, a 2-day pre-culture of explants, infection with K599 at OD(600) = 0.6, and co-cultivation on medium (pH 5.4) at 22 degrees C for 2 days enhanced the transformation frequency significantly. As proof of concept, Superroot-derived L. corniculatus was transformed with a gene from wheat encoding an Na+/H+ antiporter (TaNHX2) using the described system. Transgenic Superroot plants were obtained and had increased salt tolerance, as expected from the expression of TaNHX2. A rapid and efficient tool for gene function investigation in L. corniculatus was developed, combining the simplicity and high efficiency of the Superroot regeneration system and the availability of A. rhizogenes-mediated transformation. This system was improved by validating some parameters influencing the transformation frequency, which could reach 92% based on GUS detection. The combination of the highly efficient transformation and the regeneration system of Superroot provides a valuable tool for functional genomics studies in L. corniculatus.
2012-01-01
Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
An automated system for evaluation of the potential functionome: MAPLE version 2.1.0.
Takami, Hideto; Taniguchi, Takeaki; Arai, Wataru; Takemoto, Kazuhiro; Moriya, Yuki; Goto, Susumu
2016-07-03
Metabolic and physiological potential evaluator (MAPLE) is an automatic system that can perform a series of steps used in the evaluation of potential comprehensive functions (functionome) harboured in the genome and metagenome. MAPLE first assigns KEGG Orthology (KO) to the query gene, maps the KO-assigned genes to the Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules, and then calculates the module completion ratio (MCR) of each functional module to characterize the potential functionome in the user's own genomic and metagenomic data. In this study, we added two more useful functions to calculate module abundance and Q-value, which indicate the functional abundance and statistical significance of the MCR results, respectively, to the new version of MAPLE for more detailed comparative genomic and metagenomic analyses. Consequently, MAPLE version 2.1.0 reported significant differences in the potential functionome, functional abundance, and diversity of contributors to each function among four metagenomic datasets generated by the global ocean sampling expedition, one of the most popular environmental samples to use with this system. MAPLE version 2.1.0 is now available through the web interface (http://www.genome.jp/tools/maple/) 17 June 2016, date last accessed. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Brettin, Thomas; Davis, James J.; Disz, Terry; ...
2015-02-10
The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offersmore » a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.« less
DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data
2010-01-01
Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920
GenomeRNAi: a database for cell-based RNAi phenotypes.
Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael
2007-01-01
RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.
GenomeRNAi: a database for cell-based RNAi phenotypes
Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael
2007-01-01
RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at PMID:17135194
An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values.
Alberti, Claudio; Daniels, Noah; Hernaez, Mikel; Voges, Jan; Goldfeder, Rachel L; Hernandez-Lopez, Ana A; Mattavelli, Marco; Berger, Bonnie
2016-01-01
This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring to two state-of-the-art genomic compressors. This work has been spurred by the current activity within the ISO/IEC SC29/WG11 technical committee (a.k.a. MPEG), which is investigating the possibility of starting a standardization activity for genomic information representation.
Analysis of Aspergillus nidulans metabolism at the genome-scale
David, Helga; Özçelik, İlknur Ş; Hofmann, Gerald; Nielsen, Jens
2008-01-01
Background Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function. Results In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further insight into this phenomenon in A. nidulans. Conclusion We demonstrate how pathway modeling of A. nidulans can be used as an approach to improve the functional annotation of the genome of this organism. Furthermore we show how the metabolic model establishes functional links between genes, enabling the upgrade of the information content of transcriptome data. PMID:18405346
From Genome to Function: Systematic Analysis of the Soil Bacterium Bacillus Subtilis
Crawshaw, Samuel G.; Wipat, Anil
2001-01-01
Bacillus subtilis is a sporulating Gram-positive bacterium that lives primarily in the soil and associated water sources. Whilst this bacterium has been studied extensively in the laboratory, relatively few studies have been undertaken to study its activity in natural environments. The publication of the B. subtilis genome sequence and subsequent systematic functional analysis programme have provided an opportunity to develop tools for analysing the role and expression of Bacillus genes in situ. In this paper we discuss analytical approaches that are being developed to relate genes to function in environments such as the rhizosphere. PMID:18628943
FSPP: A Tool for Genome-Wide Prediction of smORF-Encoded Peptides and Their Functions
Li, Hui; Xiao, Li; Zhang, Lili; Wu, Jiarui; Wei, Bin; Sun, Ninghui; Zhao, Yi
2018-01-01
smORFs are small open reading frames of less than 100 codons. Recent low throughput experiments showed a lot of smORF-encoded peptides (SEPs) played crucial rule in processes such as regulation of transcription or translation, transportation through membranes and the antimicrobial activity. In order to gather more functional SEPs, it is necessary to have access to genome-wide prediction tools to give profound directions for low throughput experiments. In this study, we put forward a functional smORF-encoded peptides predictor (FSPP) which tended to predict authentic SEPs and their functions in a high throughput method. FSPP used the overlap of detected SEPs from Ribo-seq and mass spectrometry as target objects. With the expression data on transcription and translation levels, FSPP built two co-expression networks. Combing co-location relations, FSPP constructed a compound network and then annotated SEPs with functions of adjacent nodes. Tested on 38 sequenced samples of 5 human cell lines, FSPP successfully predicted 856 out of 960 annotated proteins. Interestingly, FSPP also highlighted 568 functional SEPs from these samples. After comparison, the roles predicted by FSPP were consistent with known functions. These results suggest that FSPP is a reliable tool for the identification of functional small peptides. FSPP source code can be acquired at https://www.bioinfo.org/FSPP. PMID:29675032
Applications of CRISPR/Cas9 in the Mammalian Central Nervous System
Savell, Katherine E.; Day, Jeremy J.
2017-01-01
Within the central nervous system, gene regulatory mechanisms are crucial regulators of cellular development and function, and dysregulation of these systems is commonly observed in major neuropsychiatric and neurological disorders. However, due to a lack of tools to specifically modulate the genome and epigenome in the central nervous system, many molecular and genetic mechanisms underlying cognitive function and behavior are still unknown. Although genome editing tools have been around for decades, the recent emergence of inexpensive, straightforward, and widely accessible CRISPR/Cas9 systems has led to a revolution in gene editing. The development of the catalytically dead Cas9 (dCas9) expanded this flexibility even further by acting as an anchoring system for fused effector proteins, structural scaffolds, and RNAs. Together, these advances have enabled robust, modular approaches for specific targeting and modification of the local chromatin environment at a single gene. This review highlights these advancements and how the combination of powerful modulatory tools paired with the versatility of CRISPR-Cas9-based systems offer great potential for understanding the underlying genetic and epigenetic contributions of neuronal function, behavior, and neurobiological diseases. PMID:29259522
Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.
Bolser, Dan M; Staines, Daniel M; Perry, Emily; Kersey, Paul J
2017-01-01
Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for 39 sequenced plant species. Available data includes genome sequence, gene models, functional annotation, and polymorphic loci; for the latter, additional information including population structure, individual genotypes, linkage, and phenotype data is available for some species. Comparative data is also available, including genomic alignments and "gene trees," which show the inferred evolutionary history of each gene family represented in the resource. Access to the data is provided through a genome browser, which incorporates many specialist interfaces for different data types, through a variety of programmatic interfaces, and via a specialist data mining tool supporting rapid filtering and retrieval of bulk data. Genomic data from many non-plant species, including those of plant pathogens, pests, and pollinators, is also available via the same interfaces through other divisions of Ensembl.Ensembl Plants is updated 4-6 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.eu ).
Grabundzija, Ivana; Messing, Simon A; Thomas, Jainy; Cosby, Rachel L; Bilic, Ilija; Miskey, Csaba; Gogol-Döring, Andreas; Kapitonov, Vladimir; Diem, Tanja; Dalda, Anna; Jurka, Jerzy; Pritham, Ellen J; Dyda, Fred; Izsvák, Zsuzsanna; Ivics, Zoltán
2016-03-02
Helitron transposons capture and mobilize gene fragments in eukaryotes, but experimental evidence for their transposition is lacking in the absence of an isolated active element. Here we reconstruct Helraiser, an ancient element from the bat genome, and use this transposon as an experimental tool to unravel the mechanism of Helitron transposition. A hairpin close to the 3'-end of the transposon functions as a transposition terminator. However, the 3'-end can be bypassed by the transposase, resulting in transduction of flanking sequences to new genomic locations. Helraiser transposition generates covalently closed circular intermediates, suggestive of a replicative transposition mechanism, which provides a powerful means to disseminate captured transcriptional regulatory signals across the genome. Indeed, we document the generation of novel transcripts by Helitron promoter capture both experimentally and by transcriptome analysis in bats. Our results provide mechanistic insight into Helitron transposition, and its impact on diversification of gene function by genome shuffling.
Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...
2016-11-29
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Genomic and molecular analysis of phage CMP1 from Clavibacter michiganensis subspecies michiganensis
Wittmann, Johannes; Gartemann, Karl-Heinz; Eichenlaub, Rudolf
2011-01-01
Bacteriophage CMP1 is a member of the Siphoviridae family that infects specifically the plant-pathogen Clavibacter michiganensis subsp. michiganensis. The linear double- stranded DNA is terminally redundant and not circularly permuted. The complete nucleotide sequence of the bacteriophage CMP1 genome consists of 58,652 bp including the terminal redundant ends of 791 bp. The G+C content of the phage (57%) is significantly lower than that of its host (72.66%). 74 potential open reading frames were identified and annotated by different bioinformatic tools. Two large clusters which encode the early and the late functions could be identified which are divergently transcribed. There are only a few hypothetical gene products with conserved domains and significant similarity to sequences from the databases. Functional analyses confirmed the activity of four gene products, an endonuclease, an exonuclease, a single-stranded DNA binding protein and a thymidylate synthase. Partial genomic sequences of CN77, a phage of Clavibacter michiganensis subsp. nebraskensis, revealed a similar genome structure and significant similarities on the level of deduced amino acid sequences. An endolysin with peptidase activity has been identified for both phages, which may be good tools for disease control of tomato plants against Clavibacter infections. PMID:21687530
Wittmann, Johannes; Gartemann, Karl-Heinz; Eichenlaub, Rudolf; Dreiseikelmann, Brigitte
2011-01-01
Bacteriophage CMP1 is a member of the Siphoviridae family that infects specifically the plant-pathogen Clavibacter michiganensis subsp. michiganensis. The linear double- stranded DNA is terminally redundant and not circularly permuted. The complete nucleotide sequence of the bacteriophage CMP1 genome consists of 58,652 bp including the terminal redundant ends of 791 bp. The G+C content of the phage (57%) is significantly lower than that of its host (72.66%). 74 potential open reading frames were identified and annotated by different bioinformatic tools. Two large clusters which encode the early and the late functions could be identified which are divergently transcribed. There are only a few hypothetical gene products with conserved domains and significant similarity to sequences from the databases. Functional analyses confirmed the activity of four gene products, an endonuclease, an exonuclease, a single-stranded DNA binding protein and a thymidylate synthase. Partial genomic sequences of CN77, a phage of Clavibacter michiganensis subsp. nebraskensis, revealed a similar genome structure and significant similarities on the level of deduced amino acid sequences. An endolysin with peptidase activity has been identified for both phages, which may be good tools for disease control of tomato plants against Clavibacter infections.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Streamlining genomes: toward the generation of simplified and stabilized microbial systems.
Leprince, Audrey; van Passel, Mark W J; dos Santos, Vitor A P Martins
2012-10-01
At the junction between systems and synthetic biology, genome streamlining provides a solid foundation both for increased understanding of cellular circuitry, and for the tailoring of microbial chassis towards innovative biotechnological applications. Iterative genomic deletions (targeted and random) helps to generate simplified, stabilized and predictable genomes, whereas multiplexing genome engineering reveals a broad functional genetic diversity. The decrease in oligo and gene synthesis costs promises effective combinatorial tools for the generation of chassis based on streamlined and tractable genomes. Here we review recent progresses in streamlining genomes through recombineering techniques aiming to generate insights into cellular mechanisms and responses towards the design and assembly of streamlined genome chassis together with new cellular modules in diverse biotechnological applications. Copyright © 2012 Elsevier Ltd. All rights reserved.
Louis, Ed
2011-01-01
In the early days of the yeast genome sequencing project, gene annotation was in its infancy and suffered the problem of many false positive annotations as well as missed genes. The lack of other sequences for comparison also prevented the annotation of conserved, functional sequences that were not coding. We are now in an era of comparative genomics where many closely related as well as more distantly related genomes are available for direct sequence and synteny comparisons allowing for more probable predictions of genes and other functional sequences due to conservation. We also have a plethora of functional genomics data which helps inform gene annotation for previously uncharacterised open reading frames (ORFs)/genes. For Saccharomyces cerevisiae this has resulted in a continuous updating of the gene and functional sequence annotations in the reference genome helping it retain its position as the best characterized eukaryotic organism's genome. A single reference genome for a species does not accurately describe the species and this is quite clear in the case of S. cerevisiae where the reference strain is not ideal for brewing or baking due to missing genes. Recent surveys of numerous isolates, from a variety of sources, using a variety of technologies have revealed a great deal of variation amongst isolates with genome sequence surveys providing information on novel genes, undetectable by other means. We now have a better understanding of the extant variation in S. cerevisiae as a species as well as some idea of how much we are missing from this understanding. As with gene annotation, comparative genomics enhances the discovery and description of genome variation and is providing us with the tools for understanding genome evolution, adaptation and selection, and underlying genetics of complex traits.
SNPit: a federated data integration system for the purpose of functional SNP annotation
Shen, Terry H; Carlson, Christopher S; Tarczy-Hornoch, Peter
2009-01-01
Genome wide association studies can potentially identify the genetic causes behind the majority of human diseases. With the advent of more advanced genotyping techniques, there is now an explosion of data gathered on single nucleotide polymorphisms (SNPs). The need exists for an integrated system that can provide up-to-date functional annotation information on SNPs. We have developed the SNP Integration Tool (SNPit) system to address this need. Built upon a federated data integration system, SNPit provides current information on a comprehensive list of SNP data sources. Additional logical inference analysis was included through an inference engine plug in. The SNPit web servlet is available online for use. SNPit allows users to go to one source for up-to-date information on the functional annotation of SNPs. A tool that can help to integrate and analyze the potential functional significance of SNPs is important for understanding the results from genome wide association studies. PMID:19327864
GIANT API: an application programming interface for functional genomics
Roberts, Andrew M.; Wong, Aaron K.; Fisk, Ian; Troyanskaya, Olga G.
2016-01-01
GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. PMID:27098035
ITEP: an integrated toolkit for exploration of microbial pan-genomes.
Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D
2014-01-03
Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Transformation-associated recombination (TAR) cloning for genomics studies and synthetic biology
Kouprina, Natalay; Larionov, Vladimir
2016-01-01
Transformation-associated recombination (TAR) cloning represents a unique tool for isolation and manipulation of large DNA molecules. The technique exploits a high level of homologous recombination in the yeast Sacharomyces cerevisiae. So far, TAR cloning is the only method available to selectively recover chromosomal segments up to 300 kb in length from complex and simple genomes. In addition, TAR cloning allows the assembly and cloning of entire microbe genomes up to several Mb as well as engineering of large metabolic pathways. In this review, we summarize applications of TAR cloning for functional/structural genomics and synthetic biology. PMID:27116033
CowPI: A Rumen Microbiome Focussed Version of the PICRUSt Functional Inference Software.
Wilkinson, Toby J; Huws, Sharon A; Edwards, Joan E; Kingston-Smith, Alison H; Siu-Ting, Karen; Hughes, Martin; Rubino, Francesco; Friedersdorff, Maximillian; Creevey, Christopher J
2018-01-01
Metataxonomic 16S rDNA based studies are a commonplace and useful tool in the research of the microbiome, but they do not provide the full investigative power of metagenomics and metatranscriptomics for revealing the functional potential of microbial communities. However, the use of metagenomic and metatranscriptomic technologies is hindered by high costs and skills barrier necessary to generate and interpret the data. To address this, a tool for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was developed for inferring the functional potential of an observed microbiome profile, based on 16S data. This allows functional inferences to be made from metataxonomic 16S rDNA studies with little extra work or cost, but its accuracy relies on the availability of completely sequenced genomes of representative organisms from the community being investigated. The rumen microbiome is an example of a community traditionally underrepresented in genome and sequence databases, but recent efforts by projects such as the Global Rumen Census and Hungate 1000 have resulted in a wide sampling of 16S rDNA profiles and almost 500 fully sequenced microbial genomes from this environment. Using this information, we have developed "CowPI," a focused version of the PICRUSt tool provided for use by the wider scientific community in the study of the rumen microbiome. We evaluated the accuracy of CowPI and PICRUSt using two 16S datasets from the rumen microbiome: one generated from rDNA and the other from rRNA where corresponding metagenomic and metatranscriptomic data was also available. We show that the functional profiles predicted by CowPI better match estimates for both the meta-genomic and transcriptomic datasets than PICRUSt, and capture the higher degree of genetic variation and larger pangenomes of rumen organisms. Nonetheless, whilst being closer in terms of predictive power for the rumen microbiome, there were differences when compared to both the metagenomic and metatranscriptome data and so we recommend, where possible, functional inferences from 16S data should not replace metagenomic and metatranscriptomic approaches. The tool can be accessed at http://www.cowpi.org and is provided to the wider scientific community for use in the study of the rumen microbiome.
Decoding transcriptional enhancers: Evolving from annotation to functional interpretation
Engel, Krysta L.; Mackiewicz, Mark; Hardigan, Andrew A.; Myers, Richard M.; Savic, Daniel
2016-01-01
Deciphering the intricate molecular processes that orchestrate the spatial and temporal regulation of genes has become an increasingly major focus of biological research. The differential expression of genes by diverse cell types with a common genome is a hallmark of complex cellular functions, as well as the basis for multicellular life. Importantly, a more coherent understanding of gene regulation is critical for defining developmental processes, evolutionary principles and disease etiologies. Here we present our current understanding of gene regulation by focusing on the role of enhancer elements in these complex processes. Although functional genomic methods have provided considerable advances to our understanding of gene regulation, these assays, which are usually performed on a genome-wide scale, typically provide correlative observations that lack functional interpretation. Recent innovations in genome editing technologies have placed gene regulatory studies at an exciting crossroads, as systematic, functional evaluation of enhancers and other transcriptional regulatory elements can now be performed in a coordinated, high-throughput manner across the entire genome. This review provides insights on transcriptional enhancer function, their role in development and disease, and catalogues experimental tools commonly used to study these elements. Additionally, we discuss the crucial role of novel techniques in deciphering the complex gene regulatory landscape and how these studies will shape future research. PMID:27224938
Decoding transcriptional enhancers: Evolving from annotation to functional interpretation.
Engel, Krysta L; Mackiewicz, Mark; Hardigan, Andrew A; Myers, Richard M; Savic, Daniel
2016-09-01
Deciphering the intricate molecular processes that orchestrate the spatial and temporal regulation of genes has become an increasingly major focus of biological research. The differential expression of genes by diverse cell types with a common genome is a hallmark of complex cellular functions, as well as the basis for multicellular life. Importantly, a more coherent understanding of gene regulation is critical for defining developmental processes, evolutionary principles and disease etiologies. Here we present our current understanding of gene regulation by focusing on the role of enhancer elements in these complex processes. Although functional genomic methods have provided considerable advances to our understanding of gene regulation, these assays, which are usually performed on a genome-wide scale, typically provide correlative observations that lack functional interpretation. Recent innovations in genome editing technologies have placed gene regulatory studies at an exciting crossroads, as systematic, functional evaluation of enhancers and other transcriptional regulatory elements can now be performed in a coordinated, high-throughput manner across the entire genome. This review provides insights on transcriptional enhancer function, their role in development and disease, and catalogues experimental tools commonly used to study these elements. Additionally, we discuss the crucial role of novel techniques in deciphering the complex gene regulatory landscape and how these studies will shape future research. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context
Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi
2007-01-01
Background Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. Results lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. Conclusion lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired. PMID:17877794
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.
Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi
2007-09-18
Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.
Page, Grier P; Coulibaly, Issa
2008-01-01
Microarrays are a very powerful tool for quantifying the amount of RNA in samples; however, their ability to query essentially every gene in a genome, which can number in the tens of thousands, presents analytical and interpretative problems. As a result, a variety of software and web-based tools have been developed to help with these issues. This article highlights and reviews some of the tools for the first steps in the analysis of a microarray study. We have tried for a balance between free and commercial systems. We have organized the tools by topics including image processing tools (Section 2), power analysis tools (Section 3), image analysis tools (Section 4), database tools (Section 5), databases of functional information (Section 6), annotation tools (Section 7), statistical and data mining tools (Section 8), and dissemination tools (Section 9).
Tebel, Katrin; Boldt, Vivien; Steininger, Anne; Port, Matthias; Ebert, Grit; Ullmann, Reinhard
2017-01-06
The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of GenomeCAT can be easily extended by further R packages or customized plug-ins to meet future requirements.
Large-scale gene function analysis with the PANTHER classification system.
Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D
2013-08-01
The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Van Vooren, Steven; Coessens, Bert; De Moor, Bart; Moreau, Yves; Vermeesch, Joris R
2007-09-01
Genome-wide array comparative genomic hybridization screening is uncovering pathogenic submicroscopic chromosomal imbalances in patients with developmental disorders. In those patients, imbalances appear now to be scattered across the whole genome, and most patients carry different chromosomal anomalies. Screening patients with developmental disorders can be considered a forward functional genome screen. The imbalances pinpoint the location of genes that are involved in human development. Because most imbalances encompass regions harboring multiple genes, the challenge is to (1) identify those genes responsible for the specific phenotype and (2) disentangle the role of the different genes located in an imbalanced region. In this review, we discuss novel tools and relevant databases that have recently been developed to aid this gene discovery process. Identification of the functional relevance of genes will not only deepen our understanding of human development but will, in addition, aid in the data interpretation and improve genetic counseling.
MicroScope: a platform for microbial genome annotation and comparative genomics
Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493
MicroScope: a platform for microbial genome annotation and comparative genomics.
Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.
Effective gene delivery to Trypanosoma cruzi epimastigotes through nucleofection.
Pacheco-Lugo, Lisandro; Díaz-Olmos, Yirys; Sáenz-García, José; Probst, Christian Macagnan; DaRocha, Wanderson Duarte
2017-06-01
New opportunities have raised to study the gene function approaches of Trypanosoma cruzi after its genome sequencing in 2005. Functional genomic approaches in Trypanosoma cruzi are challenging due to the reduced tools available for genetic manipulation, as well as to the reduced efficiency of the transient transfection conducted through conventional methods. The Amaxa nucleofector device was systematically tested in the present study in order to improve the electroporation conditions in the epimastigote forms of T. cruzi. The transfection efficiency was quantified using the green fluorescent protein (GFP) as reporter gene followed by cell survival assessment. The herein used nucleofection parameters have increased the survival rates (>90%) and the transfection efficiency by approximately 35%. The small amount of epimastigotes and DNA required for the nucleofection can turn the method adopted here into an attractive tool for high throughput screening (HTS) applications, and for gene editing in parasites where genetic manipulation tools remain relatively scarce. Copyright © 2017 Elsevier B.V. All rights reserved.
Exploring FlyBase Data Using QuickSearch.
Marygold, Steven J; Antonazzo, Giulia; Attrill, Helen; Costa, Marta; Crosby, Madeline A; Dos Santos, Gilberto; Goodman, Joshua L; Gramates, L Sian; Matthews, Beverley B; Rey, Alix J; Thurmond, Jim
2016-12-08
FlyBase (flybase.org) is the primary online database of genetic, genomic, and functional information about Drosophila species, with a major focus on the model organism Drosophila melanogaster. The long and rich history of Drosophila research, combined with recent surges in genomic-scale and high-throughput technologies, mean that FlyBase now houses a huge quantity of data. Researchers need to be able to rapidly and intuitively query these data, and the QuickSearch tool has been designed to meet these needs. This tool is conveniently located on the FlyBase homepage and is organized into a series of simple tabbed interfaces that cover the major data and annotation classes within the database. This unit describes the functionality of all aspects of the QuickSearch tool. With this knowledge, FlyBase users will be equipped to take full advantage of all QuickSearch features and thereby gain improved access to data relevant to their research. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
EuPathDB: the eukaryotic pathogen genomics database resource
Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie
2017-01-01
The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906
Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities
Mahadevan, Radhakrishnan; Henson, Michael A.
2012-01-01
Biotechnology research is traditionally focused on individual microbial strains that are perceived to have the necessary metabolic functions, or the capability to have these functions introduced, to achieve a particular task. For many important applications, the development of such omnipotent microbes is an extremely challenging if not impossible task. By contrast, nature employs a radically different strategy based on synergistic combinations of different microbial species that collectively achieve the desired task. These natural communities have evolved to exploit the native metabolic capabilities of each species and are highly adaptive to changes in their environments. However, microbial communities have proven difficult to study due to a lack of suitable experimental and computational tools. With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research. PMID:24688668
Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities.
Mahadevan, Radhakrishnan; Henson, Michael A
2012-01-01
Biotechnology research is traditionally focused on individual microbial strains that are perceived to have the necessary metabolic functions, or the capability to have these functions introduced, to achieve a particular task. For many important applications, the development of such omnipotent microbes is an extremely challenging if not impossible task. By contrast, nature employs a radically different strategy based on synergistic combinations of different microbial species that collectively achieve the desired task. These natural communities have evolved to exploit the native metabolic capabilities of each species and are highly adaptive to changes in their environments. However, microbial communities have proven difficult to study due to a lack of suitable experimental and computational tools. With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.
Application of genome editing technologies to the study and treatment of hematological disease.
Pellagatti, Andrea; Dolatshad, Hamid; Yip, Bon Ham; Valletta, Simona; Boultwood, Jacqueline
2016-01-01
Genome editing technologies have advanced significantly over the past few years, providing a fast and effective tool to precisely manipulate the genome at specific locations. The three commonly used genome editing technologies are Zinc Finger Nucleases (ZFNs), Transcription Activator-Like Effector Nucleases (TALENs), and the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated Cas9 (CRISPR/Cas9) system. ZFNs and TALENs consist of endonucleases fused to a DNA-binding domain, while the CRISPR/Cas9 system uses guide RNAs to target the bacterial Cas9 endonuclease to the desired genomic location. The double-strand breaks made by these endonucleases are repaired in the cells either by non-homologous end joining, resulting in the introduction of insertions/deletions, or, if a repair template is provided, by homology directed repair. The ZFNs, TALENs and CRISPR/Cas9 systems take advantage of these repair mechanisms for targeted genome modification and have been successfully used to manipulate the genome in human cells. These genome editing tools can be used to investigate gene function, to discover new therapeutic targets, and to develop disease models. Moreover, these genome editing technologies have great potential in gene therapy. Here, we review the latest advances in the application of genome editing technology to the study and treatment of hematological disorders. Copyright © 2015 Elsevier Ltd. All rights reserved.
Quanbeck, Stephanie M.; Brachova, Libuse; Campbell, Alexis A.; Guan, Xin; Perera, Ann; He, Kun; Rhee, Seung Y.; Bais, Preeti; Dickerson, Julie A.; Dixon, Philip; Wohlgemuth, Gert; Fiehn, Oliver; Barkan, Lenore; Lange, Iris; Lange, B. Markus; Lee, Insuk; Cortes, Diego; Salazar, Carolina; Shuman, Joel; Shulaev, Vladimir; Huhman, David V.; Sumner, Lloyd W.; Roth, Mary R.; Welti, Ruth; Ilarslan, Hilal; Wurtele, Eve S.; Nikolau, Basil J.
2012-01-01
Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs. PMID:22645570
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.
Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A
2013-02-01
Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
Genome Editing Tools in Plants
Mohanta, Tapan Kumar; Bashir, Tufail; Hashem, Abeer; Bae, Hanhong
2017-01-01
Genome editing tools have the potential to change the genomic architecture of a genome at precise locations, with desired accuracy. These tools have been efficiently used for trait discovery and for the generation of plants with high crop yields and resistance to biotic and abiotic stresses. Due to complex genomic architecture, it is challenging to edit all of the genes/genomes using a particular genome editing tool. Therefore, to overcome this challenging task, several genome editing tools have been developed to facilitate efficient genome editing. Some of the major genome editing tools used to edit plant genomes are: Homologous recombination (HR), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), pentatricopeptide repeat proteins (PPRs), the CRISPR/Cas9 system, RNA interference (RNAi), cisgenesis, and intragenesis. In addition, site-directed sequence editing and oligonucleotide-directed mutagenesis have the potential to edit the genome at the single-nucleotide level. Recently, adenine base editors (ABEs) have been developed to mutate A-T base pairs to G-C base pairs. ABEs use deoxyadeninedeaminase (TadA) with catalytically impaired Cas9 nickase to mutate A-T base pairs to G-C base pairs. PMID:29257124
Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome.
Wang, Yi; Liu, Xianju; Ren, Chong; Zhong, Gan-Yuan; Yang, Long; Li, Shaohua; Liang, Zhenchang
2016-04-21
CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of humans, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific target sites for CRISPR/Cas9 have been computationally identified for several annual model and crop species, but such sites have not been reported for perennial, woody fruit species. In this study, we identified and characterized five types of CRISPR/Cas9 target sites in the widely cultivated grape species Vitis vinifera and developed a user-friendly database for editing grape genomes in the future. A total of 35,767,960 potential CRISPR/Cas9 target sites were identified from grape genomes in this study. Among them, 22,597,817 target sites were mapped to specific genomic locations and 7,269,788 were found to be highly specific. Protospacers and PAMs were found to distribute uniformly and abundantly in the grape genomes. They were present in all the structural elements of genes with the coding region having the highest abundance. Five PAM types, TGG, AGG, GGG, CGG and NGG, were observed. With the exception of the NGG type, they were abundantly present in the grape genomes. Synteny analysis of similar genes revealed that the synteny of protospacers matched the synteny of homologous genes. A user-friendly database containing protospacers and detailed information of the sites was developed and is available for public use at the Grape-CRISPR website ( http://biodb.sdau.edu.cn/gc/index.html ). Grape genomes harbour millions of potential CRISPR/Cas9 target sites. These sites are widely distributed among and within chromosomes with predominant abundance in the coding regions of genes. We developed a publicly-accessible Grape-CRISPR database for facilitating the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among other functions, the database allows users to identify and select multi-protospacers for editing similar sequences in grape genomes simultaneously.
Potential pitfalls of CRISPR/Cas9-mediated genome editing.
Peng, Rongxue; Lin, Guigao; Li, Jinming
2016-04-01
Recently, a novel technique named the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein (Cas)9 system has been rapidly developed. This genome editing tool has improved our ability tremendously with respect to exploring the pathogenesis of diseases and correcting disease mutations, as well as phenotypes. With a short guide RNA, Cas9 can be precisely directed to target sites, and functions as an endonuclease to efficiently produce breaks in DNA double strands. Over the past 30 years, CRISPR has evolved from the 'curious sequences of unknown biological function' into a promising genome editing tool. As a result of the incessant development in the CRISPR/Cas9 system, Cas9 co-expressed with custom guide RNAs has been successfully used in a variety of cells and organisms. This genome editing technology can also be applied to synthetic biology, functional genomic screening, transcriptional modulation and gene therapy. However, although CRISPR/Cas9 has a broad range of action in science, there are several aspects that affect its efficiency and specificity, including Cas9 activity, target site selection and short guide RNA design, delivery methods, off-target effects and the incidence of homology-directed repair. In the present review, we highlight the factors that affect the utilization of CRISPR/Cas9, as well as possible strategies for handling any problems. Addressing these issues will allow us to take better advantage of this technique. In addition, we also review the history and rapid development of the CRISPR/Cas system from the time of its initial discovery in 2012. © 2015 FEBS.
Scripps Genome ADVISER: Annotation and Distributed Variant Interpretation SERver
Pham, Phillip H.; Shipman, William J.; Erikson, Galina A.; Schork, Nicholas J.; Torkamani, Ali
2015-01-01
Interpretation of human genomes is a major challenge. We present the Scripps Genome ADVISER (SG-ADVISER) suite, which aims to fill the gap between data generation and genome interpretation by performing holistic, in-depth, annotations and functional predictions on all variant types and effects. The SG-ADVISER suite includes a de-identification tool, a variant annotation web-server, and a user interface for inheritance and annotation-based filtration. SG-ADVISER allows users with no bioinformatics expertise to manipulate large volumes of variant data with ease – without the need to download large reference databases, install software, or use a command line interface. SG-ADVISER is freely available at genomics.scripps.edu/ADVISER. PMID:25706643
GenomicTools: a computational platform for developing high-throughput analytics in genomics.
Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo
2012-01-15
Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.
Conservative site-specific and single-copy transgenesis in human LINE-1 elements
Vijaya Chandra, Shree Harsha; Makhija, Harshyaa; Peter, Sabrina; Myint Wai, Cho Mar; Li, Jinming; Zhu, Jindong; Ren, Zhonglu; D'Alcontres, Martina Stagno; Siau, Jia Wei; Chee, Sharon; Ghadessy, Farid John; Dröge, Peter
2016-01-01
Genome engineering of human cells plays an important role in biotechnology and molecular medicine. In particular, insertions of functional multi-transgene cassettes into suitable endogenous sequences will lead to novel applications. Although several tools have been exploited in this context, safety issues such as cytotoxicity, insertional mutagenesis and off-target cleavage together with limitations in cargo size/expression often compromise utility. Phage λ integrase (Int) is a transgenesis tool that mediates conservative site-specific integration of 48 kb DNA into a safe harbor site of the bacterial genome. Here, we show that an Int variant precisely recombines large episomes into a sequence, termed attH4X, found in 1000 human Long INterspersed Elements-1 (LINE-1). We demonstrate single-copy transgenesis through attH4X-targeting in various cell lines including hESCs, with the flexibility of selecting clones according to transgene performance and downstream applications. This is exemplified with pluripotency reporter cassettes and constitutively expressed payloads that remain functional in LINE1-targeted hESCs and differentiated progenies. Furthermore, LINE-1 targeting does not induce DNA damage-response or chromosomal aberrations, and neither global nor localized endogenous gene expression is substantially affected. Hence, this simple transgene addition tool should become particularly useful for applications that require engineering of the human genome with multi-transgenes. PMID:26673710
iScreen: Image-Based High-Content RNAi Screening Analysis Tools.
Zhong, Rui; Dong, Xiaonan; Levine, Beth; Xie, Yang; Xiao, Guanghua
2015-09-01
High-throughput RNA interference (RNAi) screening has opened up a path to investigating functional genomics in a genome-wide pattern. However, such studies are often restricted to assays that have a single readout format. Recently, advanced image technologies have been coupled with high-throughput RNAi screening to develop high-content screening, in which one or more cell image(s), instead of a single readout, were generated from each well. This image-based high-content screening technology has led to genome-wide functional annotation in a wider spectrum of biological research studies, as well as in drug and target discovery, so that complex cellular phenotypes can be measured in a multiparametric format. Despite these advances, data analysis and visualization tools are still largely lacking for these types of experiments. Therefore, we developed iScreen (image-Based High-content RNAi Screening Analysis Tool), an R package for the statistical modeling and visualization of image-based high-content RNAi screening. Two case studies were used to demonstrate the capability and efficiency of the iScreen package. iScreen is available for download on CRAN (http://cran.cnr.berkeley.edu/web/packages/iScreen/index.html). The user manual is also available as a supplementary document. © 2014 Society for Laboratory Automation and Screening.
Decoding genes with coexpression networks and metabolomics - 'majority report by precogs'.
Saito, Kazuki; Hirai, Masami Y; Yonekura-Sakakibara, Keiko
2008-01-01
Following the sequencing of whole genomes of model plants, high-throughput decoding of gene function is a major challenge in modern plant biology. In view of remarkable technical advances in transcriptomics and metabolomics, integrated analysis of these 'omics' by data-mining informatics is an excellent tool for prediction and identification of gene function, particularly for genes involved in complicated metabolic pathways. The availability of Arabidopsis public transcriptome datasets containing data of >1000 microarrays reinforces the potential for prediction of gene function by transcriptome coexpression analysis. Here, we review the strategy of combining transcriptome and metabolome as a powerful technology for studying the functional genomics of model plants and also crop and medicinal plants.
The future is now: single-cell genomics of bacteria and archaea
Blainey, Paul C.
2013-01-01
Interest in the expanding catalog of uncultivated microorganisms, increasing recognition of heterogeneity among seemingly similar cells, and technological advances in whole-genome amplification and single-cell manipulation are driving considerable progress in single-cell genomics. Here, the spectrum of applications for single-cell genomics, key advances in the development of the field, and emerging methodology for single-cell genome sequencing are reviewed by example with attention to the diversity of approaches and their unique characteristics. Experimental strategies transcending specific methodologies are identified and organized as a road map for future studies in single-cell genomics of environmental microorganisms. Over the next decade, increasingly powerful tools for single-cell genome sequencing and analysis will play key roles in accessing the genomes of uncultivated organisms, determining the basis of microbial community functions, and fundamental aspects of microbial population biology. PMID:23298390
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gu, Shengyin; Anderson, Iain; Kunin, Victor
2007-05-07
Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction
2012-01-01
Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp. PMID:22533342
The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction.
Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo
2012-04-25
Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp.
Technological advances and genomics in metazoan parasites.
Knox, D P
2004-02-01
Molecular biology has provided the means to identify parasite proteins, to define their function, patterns of expression and the means to produce them in quantity for subsequent functional analyses. Whole genome and expressed sequence tag programmes, and the parallel development of powerful bioinformatics tools, allow the execution of genome-wide between stage or species comparisons and meaningful gene-expression profiling. The latter can be undertaken with several new technologies such as DNA microarray and serial analysis of gene expression. Proteome analysis has come to the fore in recent years providing a crucial link between the gene and its protein product. RNA interference and ballistic gene transfer are exciting developments which can provide the means to precisely define the function of individual genes and, of importance in devising novel parasite control strategies, the effect that gene knockdown will have on parasite survival.
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
GCView: the genomic context viewer for protein homology searches
Grin, Iwan; Linke, Dirk
2011-01-01
Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955
Observing copepods through a genomic lens
2011-01-01
Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to provide genomics tools for copepods. Summary Genomics research on copepods is needed to extend our exploration and characterization of their fundamental biological traits, so that we can better understand how copepods function and interact in diverse environments. Availability of large scale genomics resources will also open doors to a wide range of systems biology type studies that view the organism as the fundamental system in which to address key questions in ecology and evolution. PMID:21933388
Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.
Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan
2017-10-01
Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.
CRISPR/Cas9: From Genome Engineering to Cancer Drug Discovery
Luo, Ji
2016-01-01
Advances in translational research are often driven by new technologies. The advent of microarrays, next-generation sequencing, proteomics and RNA interference (RNAi) have led to breakthroughs in our understanding of the mechanisms of cancer and the discovery of new cancer drug targets. The discovery of the bacterial clustered regularly interspaced palindromic repeat (CRISPR) system and its subsequent adaptation as a tool for mammalian genome engineering has opened up new avenues for functional genomics studies. This review will focus on the utility of CRISPR in the context of cancer drug target discovery. PMID:28603775
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C
2009-11-12
In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.
Genome-wide protein-protein interactions and protein function exploration in cyanobacteria
Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu
2015-01-01
Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and “interologs” in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033
Wang, Harris H; Church, George M
2011-01-01
Engineering at the scale of whole genomes requires fundamentally new molecular biology tools. Recent advances in recombineering using synthetic oligonucleotides enable the rapid generation of mutants at high efficiency and specificity and can be implemented at the genome scale. With these techniques, libraries of mutants can be generated, from which individuals with functionally useful phenotypes can be isolated. Furthermore, populations of cells can be evolved in situ by directed evolution using complex pools of oligonucleotides. Here, we discuss ways to utilize these multiplexed genome engineering methods, with special emphasis on experimental design and implementation. Copyright © 2011 Elsevier Inc. All rights reserved.
The Comprehensive Microbial Resource.
Peterson, J D; Umayam, L A; Dickinson, T; Hickey, E K; White, O
2001-01-01
One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.
Homogeneous versus heterogeneous probes for microbial ecological microarrays.
Bae, Jin-Woo; Park, Yong-Ha
2006-07-01
Microbial ecological microarrays have been developed for investigating the composition and functions of microorganism communities in environmental niches. These arrays include microbial identification microarrays, which use oligonucleotides, gene fragments or microbial genomes as probes. In this article, the advantages and disadvantages of each type of probe are reviewed. Oligonucleotide probes are currently useful for probing uncultivated bacteria that are not amenable to gene fragment probing, whereas the functional gene fragments amplified randomly from microbial genomes require phylogenetic and hierarchical categorization before use as microbial identification probes, despite their high resolution for both specificity and sensitivity. Until more bacteria are sequenced and gene fragment probes are thoroughly validated, heterogeneous bacterial genome probes will provide a simple, sensitive and quantitative tool for exploring the ecosystem structure.
Using Galaxy to Perform Large-Scale Interactive Data Analyses
Hillman-Jackson, Jennifer; Clements, Dave; Blankenberg, Daniel; Taylor, James; Nekrutenko, Anton
2012-01-01
Innovations in biomedical research technologies continue to provide experimental biologists with novel and increasingly large genomic and high-throughput data resources to be analyzed. As creating and obtaining data has become easier, the key decision faced by many researchers is a practical one: where and how should an analysis be performed? Datasets are large and analysis tool set-up and use is riddled with complexities outside of the scope of core research activities. The authors believe that Galaxy (galaxyproject.org) provides a powerful solution that simplifies data acquisition and analysis in an intuitive web-application, granting all researchers access to key informatics tools previously only available to computational specialists working in Unix-based environments. We will demonstrate through a series of biomedically relevant protocols how Galaxy specifically brings together 1) data retrieval from public and private sources, for example, UCSC’s Eukaryote and Microbial Genome Browsers (genome.ucsc.edu), 2) custom tools (wrapped Unix functions, format standardization/conversions, interval operations) and 3rd party analysis tools, for example, Bowtie/Tuxedo Suite (bowtie-bio.sourceforge.net), Lastz (www.bx.psu.edu/~rsharris/lastz/), SAMTools (samtools.sourceforge.net), FASTX-toolkit (hannonlab.cshl.edu/fastx_toolkit), and MACS (liulab.dfci.harvard.edu/MACS), and creates results formatted for visualization in tools such as the Galaxy Track Browser (GTB, galaxyproject.org/wiki/Learn/Visualization), UCSC Genome Browser (genome.ucsc.edu), Ensembl (www.ensembl.org), and GeneTrack (genetrack.bx.psu.edu). Galaxy rapidly has become the most popular choice for integrated next generation sequencing (NGS) analytics and collaboration, where users can perform, document, and share complex analysis within a single interface in an unprecedented number of ways. PMID:18428782
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.
Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario
2011-01-01
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Huang, Ying; Chen, Shi-Yi; Deng, Feilong
2016-01-01
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Richard A.; Brown, Joseph M.; Colby, Sean M.
ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based onmore » majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.« less
Rondon, Michelle R.; Raffel, Sandra J.; Goodman, Robert M.; Handelsman, Jo
1999-01-01
As the study of microbes moves into the era of functional genomics, there is an increasing need for molecular tools for analysis of a wide diversity of microorganisms. Currently, biological study of many prokaryotes of agricultural, medical, and fundamental scientific interest is limited by the lack of adequate genetic tools. We report the application of the bacterial artificial chromosome (BAC) vector to prokaryotic biology as a powerful approach to address this need. We constructed a BAC library in Escherichia coli from genomic DNA of the Gram-positive bacterium Bacillus cereus. This library provides 5.75-fold coverage of the B. cereus genome, with an average insert size of 98 kb. To determine the extent of heterologous expression of B. cereus genes in the library, we screened it for expression of several B. cereus activities in the E. coli host. Clones expressing 6 of 10 activities tested were identified in the library, namely, ampicillin resistance, zwittermicin A resistance, esculin hydrolysis, hemolysis, orange pigment production, and lecithinase activity. We analyzed selected BAC clones genetically to identify rapidly specific B. cereus loci. These results suggest that BAC libraries will provide a powerful approach for studying gene expression from diverse prokaryotes. PMID:10339608
Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary
2016-08-01
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.
Cas9 versus Cas12a/Cpf1: Structure-function comparisons and implications for genome editing.
Swarts, Daan C; Jinek, Martin
2018-05-22
Cas9 and Cas12a are multidomain CRISPR-associated nucleases that can be programmed with a guide RNA to bind and cleave complementary DNA targets. The guide RNA sequence can be varied, making these effector enzymes versatile tools for genome editing and gene regulation applications. While Cas9 is currently the best-characterized and most widely used nuclease for such purposes, Cas12a (previously named Cpf1) has recently emerged as an alternative for Cas9. Cas9 and Cas12a have distinct evolutionary origins and exhibit different structural architectures, resulting in distinct molecular mechanisms. Here we compare the structural and mechanistic features that distinguish Cas9 and Cas12a, and describe how these features modulate their activity. We discuss implications for genome editing, and how they may influence the choice of Cas9 or Cas12a for specific applications. Finally, we review recent studies in which Cas12a has been utilized as a genome editing tool. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Biogenesis of Effector Small RNAs RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes. © 2018 Wiley Periodicals, Inc.
Zhou, Jindan; Rudd, Kenneth E.
2013-01-01
EcoGene (http://ecogene.org) is a database and website devoted to continuously improving the structural and functional annotation of Escherichia coli K-12, one of the most well understood model organisms, represented by the MG1655(Seq) genome sequence and annotations. Major improvements to EcoGene in the past decade include (i) graphic presentations of genome map features; (ii) ability to design Boolean queries and Venn diagrams from EcoArray, EcoTopics or user-provided GeneSets; (iii) the genome-wide clone and deletion primer design tool, PrimerPairs; (iv) sequence searches using a customized EcoBLAST; (v) a Cross Reference table of synonymous gene and protein identifiers; (vi) proteome-wide indexing with GO terms; (vii) EcoTools access to >2000 complete bacterial genomes in EcoGene-RefSeq; (viii) establishment of a MySql relational database; and (ix) use of web content management systems. The biomedical literature is surveyed daily to provide citation and gene function updates. As of September 2012, the review of 37 397 abstracts and articles led to creation of 98 425 PubMed-Gene links and 5415 PubMed-Topic links. Annotation updates to Genbank U00096 are transmitted from EcoGene to NCBI. Experimental verifications include confirmation of a CTG start codon, pseudogene restoration and quality assurance of the Keio strain collection. PMID:23197660
Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N
2017-01-04
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhou, Jindan; Rudd, Kenneth E
2013-01-01
EcoGene (http://ecogene.org) is a database and website devoted to continuously improving the structural and functional annotation of Escherichia coli K-12, one of the most well understood model organisms, represented by the MG1655(Seq) genome sequence and annotations. Major improvements to EcoGene in the past decade include (i) graphic presentations of genome map features; (ii) ability to design Boolean queries and Venn diagrams from EcoArray, EcoTopics or user-provided GeneSets; (iii) the genome-wide clone and deletion primer design tool, PrimerPairs; (iv) sequence searches using a customized EcoBLAST; (v) a Cross Reference table of synonymous gene and protein identifiers; (vi) proteome-wide indexing with GO terms; (vii) EcoTools access to >2000 complete bacterial genomes in EcoGene-RefSeq; (viii) establishment of a MySql relational database; and (ix) use of web content management systems. The biomedical literature is surveyed daily to provide citation and gene function updates. As of September 2012, the review of 37 397 abstracts and articles led to creation of 98 425 PubMed-Gene links and 5415 PubMed-Topic links. Annotation updates to Genbank U00096 are transmitted from EcoGene to NCBI. Experimental verifications include confirmation of a CTG start codon, pseudogene restoration and quality assurance of the Keio strain collection.
Harnessing CRISPR-Cas systems for bacterial genome editing.
Selle, Kurt; Barrangou, Rodolphe
2015-04-01
Manipulation of genomic sequences facilitates the identification and characterization of key genetic determinants in the investigation of biological processes. Genome editing via clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) constitutes a next-generation method for programmable and high-throughput functional genomics. CRISPR-Cas systems are readily reprogrammed to induce sequence-specific DNA breaks at target loci, resulting in fixed mutations via host-dependent DNA repair mechanisms. Although bacterial genome editing is a relatively unexplored and underrepresented application of CRISPR-Cas systems, recent studies provide valuable insights for the widespread future implementation of this technology. This review summarizes recent progress in bacterial genome editing and identifies fundamental genetic and phenotypic outcomes of CRISPR targeting in bacteria, in the context of tool development, genome homeostasis, and DNA repair. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automated multiplex genome-scale engineering in yeast
Si, Tong; Chao, Ran; Min, Yuhao; Wu, Yuying; Ren, Wen; Zhao, Huimin
2017-01-01
Genome-scale engineering is indispensable in understanding and engineering microorganisms, but the current tools are mainly limited to bacterial systems. Here we report an automated platform for multiplex genome-scale engineering in Saccharomyces cerevisiae, an important eukaryotic model and widely used microbial cell factory. Standardized genetic parts encoding overexpression and knockdown mutations of >90% yeast genes are created in a single step from a full-length cDNA library. With the aid of CRISPR-Cas, these genetic parts are iteratively integrated into the repetitive genomic sequences in a modular manner using robotic automation. This system allows functional mapping and multiplex optimization on a genome scale for diverse phenotypes including cellulase expression, isobutanol production, glycerol utilization and acetic acid tolerance, and may greatly accelerate future genome-scale engineering endeavours in yeast. PMID:28469255
MVisAGe Identifies Concordant and Discordant Genomic Alterations of Driver Genes in Squamous Tumors.
Walter, Vonn; Du, Ying; Danilova, Ludmila; Hayward, Michele C; Hayes, D Neil
2018-06-15
Integrated analyses of multiple genomic datatypes are now common in cancer profiling studies. Such data present opportunities for numerous computational experiments, yet analytic pipelines are limited. Tools such as the cBioPortal and Regulome Explorer, although useful, are not easy to access programmatically or to implement locally. Here, we introduce the MVisAGe R package, which allows users to quantify gene-level associations between two genomic datatypes to investigate the effect of genomic alterations (e.g., DNA copy number changes on gene expression). Visualizing Pearson/Spearman correlation coefficients according to the genomic positions of the underlying genes provides a powerful yet novel tool for conducting exploratory analyses. We demonstrate its utility by analyzing three publicly available cancer datasets. Our approach highlights canonical oncogenes in chr11q13 that displayed the strongest associations between expression and copy number, including CCND1 and CTTN , genes not identified by copy number analysis in the primary reports. We demonstrate highly concordant usage of shared oncogenes on chr3q, yet strikingly diverse oncogene usage on chr11q as a function of HPV infection status. Regions of chr19 that display remarkable associations between methylation and gene expression were identified, as were previously unreported miRNA-gene expression associations that may contribute to the epithelial-to-mesenchymal transition. Significance: This study presents an important bioinformatics tool that will enable integrated analyses of multiple genomic datatypes. Cancer Res; 78(12); 3375-85. ©2018 AACR . ©2018 American Association for Cancer Research.
IMG/M: integrated genome and metagenome comparative data analysis system
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; ...
2016-10-13
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less
IMG/M: integrated genome and metagenome comparative data analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less
Advances in Maize Genomics and Their Value for Enhancing Genetic Gains from Breeding
Xu, Yunbi; Skinner, Debra J.; Wu, Huixia; Palacios-Rojas, Natalia; Araus, Jose Luis; Yan, Jianbing; Gao, Shibin; Warburton, Marilyn L.; Crouch, Jonathan H.
2009-01-01
Maize is an important crop for food, feed, forage, and fuel across tropical and temperate areas of the world. Diversity studies at genetic, molecular, and functional levels have revealed that, tropical maize germplasm, landraces, and wild relatives harbor a significantly wider range of genetic variation. Among all types of markers, SNP markers are increasingly the marker-of-choice for all genomics applications in maize breeding. Genetic mapping has been developed through conventional linkage mapping and more recently through linkage disequilibrium-based association analyses. Maize genome sequencing, initially focused on gene-rich regions, now aims for the availability of complete genome sequence. Conventional insertion mutation-based cloning has been complemented recently by EST- and map-based cloning. Transgenics and nutritional genomics are rapidly advancing fields targeting important agronomic traits including pest resistance and grain quality. Substantial advances have been made in methodologies for genomics-assisted breeding, enhancing progress in yield as well as abiotic and biotic stress resistances. Various genomic databases and informatics tools have been developed, among which MaizeGDB is the most developed and widely used by the maize research community. In the future, more emphasis should be given to the development of tools and strategic germplasm resources for more effective molecular breeding of tropical maize products. PMID:19688107
IMG/M: integrated genome and metagenome comparative data analysis system
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N.; Kyrpides, Nikos C.
2017-01-01
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system. PMID:27738135
Three-Dimensional Genome Organization and Function in Drosophila
Schwartz, Yuri B.; Cavalli, Giacomo
2017-01-01
Understanding how the metazoan genome is used during development and cell differentiation is one of the major challenges in the postgenomic era. Early studies in Drosophila suggested that three-dimensional (3D) chromosome organization plays important regulatory roles in this process and recent technological advances started to reveal connections at the molecular level. Here we will consider general features of the architectural organization of the Drosophila genome, providing historical perspective and insights from recent work. We will compare the linear and spatial segmentation of the fly genome and focus on the two key regulators of genome architecture: insulator components and Polycomb group proteins. With its unique set of genetic tools and a compact, well annotated genome, Drosophila is poised to remain a model system of choice for rapid progress in understanding principles of genome organization and to serve as a proving ground for development of 3D genome-engineering techniques. PMID:28049701
Application of Genomic In Situ Hybridization in Horticultural Science
Ramzan, Fahad; Lim, Ki-Byung
2017-01-01
Molecular cytogenetic techniques, such as in situ hybridization methods, are admirable tools to analyze the genomic structure and function, chromosome constituents, recombination patterns, alien gene introgression, genome evolution, aneuploidy, and polyploidy and also genome constitution visualization and chromosome discrimination from different genomes in allopolyploids of various horticultural crops. Using GISH advancement as multicolor detection is a significant approach to analyze the small and numerous chromosomes in fruit species, for example, Diospyros hybrids. This analytical technique has proved to be the most exact and effective way for hybrid status confirmation and helps remarkably to distinguish donor parental genomes in hybrids such as Clivia, Rhododendron, and Lycoris ornamental hybrids. The genome characterization facilitates in hybrid selection having potential desirable characteristics during the early hybridization breeding, as this technique expedites to detect introgressed sequence chromosomes. This review study epitomizes applications and advancements of genomic in situ hybridization (GISH) techniques in horticultural plants. PMID:28459054
CRISPR/Cas9-loxP-Mediated Gene Editing as a Novel Site-Specific Genetic Manipulation Tool.
Yang, Fayu; Liu, Changbao; Chen, Ding; Tu, Mengjun; Xie, Haihua; Sun, Huihui; Ge, Xianglian; Tang, Lianchao; Li, Jin; Zheng, Jiayong; Song, Zongming; Qu, Jia; Gu, Feng
2017-06-16
Cre-loxP, as one of the site-specific genetic manipulation tools, offers a method to study the spatial and temporal regulation of gene expression/inactivation in order to decipher gene function. CRISPR/Cas9-mediated targeted genome engineering technologies are sparking a new revolution in biological research. Whether the traditional site-specific genetic manipulation tool and CRISPR/Cas9 could be combined to create a novel genetic tool for highly specific gene editing is not clear. Here, we successfully generated a CRISPR/Cas9-loxP system to perform gene editing in human cells, providing the proof of principle that these two technologies can be used together for the first time. We also showed that distinct non-homologous end-joining (NHEJ) patterns from CRISPR/Cas9-mediated gene editing of the targeting sequence locates at the level of plasmids (episomal) and chromosomes. Specially, the CRISPR/Cas9-mediated NHEJ pattern in the nuclear genome favors deletions (64%-68% at the human AAVS1 locus versus 4%-28% plasmid DNA). CRISPR/Cas9-loxP, a novel site-specific genetic manipulation tool, offers a platform for the dissection of gene function and molecular insights into DNA-repair pathways. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
WebArray: an online platform for microarray data analysis
Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng
2005-01-01
Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165
Fang, Hai; Knezevic, Bogdan; Burnham, Katie L; Knight, Julian C
2016-12-13
Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem. We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants. XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .
Genome-wide inference of regulatory networks in Streptomyces coelicolor.
Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou
2010-10-18
The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.
Bolbase: a comprehensive genomics database for Brassica oleracea.
Yu, Jingyin; Zhao, Meixia; Wang, Xiaowu; Tong, Chaobo; Huang, Shunmou; Tehrim, Sadia; Liu, Yumei; Hua, Wei; Liu, Shengyi
2013-09-30
Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives. We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting. Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase.
GIANT API: an application programming interface for functional genomics.
Roberts, Andrew M; Wong, Aaron K; Fisk, Ian; Troyanskaya, Olga G
2016-07-08
GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
The goal of this project is to use siRNA screens to identify NSCLC-selective siRNAs from two genome-wide libraries that will allow us to functionally define genetic dependencies of subtypes of NSCLC. Using bioinformatics tools, the CTD2 center at the University of Texas Southwestern Medical Center are discovering associations between this functional data (siRNAs) and NSCLC mutational status, methylation arrays, gene expression arrays, and copy number variation data that will help us identify new targets and enrollment biomarkers.
The goal of this project is to use siRNA screens to identify NSCLC-selective siRNAs from two genome-wide libraries that will allow us to functionally define genetic dependencies of subtypes of NSCLC. Using bioinformatics tools, the CTD2 center at the University of Texas Southwestern Medical Center are discovering associations between this functional data (siRNAs) and NSCLC mutational status, methylation arrays, gene expression arrays, and copy number variation data that will help us identify new targets and enrollment biomarkers.
Chen, I-Min A; Markowitz, Victor M; Palaniappan, Krishna; Szeto, Ernest; Chu, Ken; Huang, Jinghua; Ratner, Anna; Pillay, Manoj; Hadjithomas, Michalis; Huntemann, Marcel; Mikhailova, Natalia; Ovchinnikova, Galina; Ivanova, Natalia N; Kyrpides, Nikos C
2016-04-26
The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existing IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.
New technologies accelerate the exploration of non-coding RNAs in horticultural plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Degao; Mewalal, Ritesh; Hu, Rongbin
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants andmore » discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs.« less
New technologies accelerate the exploration of non-coding RNAs in horticultural plants
Liu, Degao; Mewalal, Ritesh; Hu, Rongbin; Tuskan, Gerald A; Yang, Xiaohan
2017-01-01
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants and discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs. PMID:28698797
Genome-Enabled Molecular Tools for Reductive Dehalogenation
2011-11-01
Genome-Enabled Molecular Tools for Reductive Dehalogenation - A Shift in Paradigm for Bioremediation - Alfred M. Spormann Departments of Chemical...Genome-Enabled Molecular Tools for Reductive Dehalogenation 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d...Applications Technical Session No. 3D C-77 GENOME-ENABLED MOLECULAR TOOLS FOR REDUCTIVE DEHALOGENATION PROFESSOR ALFRED SPORMANN Stanford
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu
2016-06-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.
Finding approximate gene clusters with Gecko 3.
Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian
2016-11-16
Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Therapeutic applications of CRISPR/Cas9 system in gene therapy.
Mollanoori, Hasan; Teimourian, Shahram
2018-06-01
Gene therapy is based on the principle of the genetic manipulation of DNA or RNA for treating and preventing human diseases. The clustered regularly interspaced short palindromic repeats/CRISPR associated nuclease9 (CRISPR/Cas9) system, derived from the acquired immune system in bacteria and archaea, has provided a new tool for accurate manipulation of genomic sequence to attain a therapeutic result. The advantage of CRISPR which made it an easy and flexible tool for diverse genome editing purposes is that a single protein (Cas9) complex with 2 short RNA sequences, function as a site-specific endonuclease. Recently, application of CRISPR/Cas9 system has become popular for therapeutic aims such as gene therapy. In this article, we review the fundamental mechanisms of CRISPR-Cas9 function and summarize preclinical CRISPR-mediated gene therapy reports on a wide variety of disorders.
Greenwald, William W; Li, He; Smith, Erin N; Benaglio, Paola; Nariai, Naoki; Frazer, Kelly A
2017-04-07
Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed. This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite "pgltools": a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses. Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.
Wragg, J; Müller, F
2016-01-01
Embryo development commences with the fusion of two terminally differentiated haploid gametes into the totipotent fertilized egg, which through a series of major cellular and molecular transitions generate a pluripotent cell mass. The activation of the zygotic genome occurs during the so-called maternal to zygotic transition and prepares the embryo for zygotic takeover from maternal factors, in the control of the development of cellular lineages during differentiation. Recent advances in next generation sequencing technologies have allowed the dissection of the genomic and epigenomic processes mediating this transition. These processes include reorganization of the chromatin structure to a transcriptionally permissive state, changes in composition and function of structural and regulatory DNA-binding proteins, and changeover of the transcriptome as it is overhauled from that deposited by the mother in the oocyte to a zygotically transcribed complement. Zygotic genome activation in zebrafish occurs 10 cell cycles after fertilization and provides an ideal experimental platform for elucidating the temporal sequence and dynamics of establishment of a transcriptionally active chromatin state and helps in identifying the determinants of transcription activation at polymerase II transcribed gene promoters. The relatively large number of pluripotent cells generated by the fast cell divisions before zygotic transcription provides sufficient biomass for next generation sequencing technology approaches to establish the temporal dynamics of events and suggest causative relationship between them. However, genomic and genetic technologies need to be improved further to capture the earliest events in development, where cell number is a limiting factor. These technologies need to be complemented with precise, inducible genetic interference studies using the latest genome editing tools to reveal the function of candidate determinants and to confirm the predictions made by classic embryological tools and genome-wide assays. In this review we summarize recent advances in the characterization of epigenetic regulation, transcription control, and gene promoter function during zygotic genome activation and how they fit with old models for the mechanisms of the maternal to zygotic transition. This review will focus on the zebrafish embryo but draw comparisons with other vertebrate model systems and refer to invertebrate models where informative. Copyright © 2016 Elsevier Inc. All rights reserved.
GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species
Kumar, Sujai; Stevens, Lewis; Blaxter, Mark
2017-01-01
Abstract As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration. Database URL: http://GenomeHubs.org PMID:28605774
Advances in Setaria genomics for genetic improvement of cereals and bioenergy grasses.
Muthamilarasan, Mehanathan; Prasad, Manoj
2015-01-01
Recent advances in Setaria genomics appear promising for genetic improvement of cereals and biofuel crops towards providing multiple securities to the steadily increasing global population. The prominent attributes of foxtail millet (Setaria italica, cultivated) and green foxtail (S. viridis, wild) including small genome size, short life-cycle, in-breeding nature, genetic close-relatedness to several cereals, millets and bioenergy grasses, and potential abiotic stress tolerance have accentuated these two Setaria species as novel model system for studying C4 photosynthesis, stress biology and biofuel traits. Considering this, studies have been performed on structural and functional genomics of these plants to develop genetic and genomic resources, and to delineate the physiology and molecular biology of stress tolerance, for the improvement of millets, cereals and bioenergy grasses. The release of foxtail millet genome sequence has provided a new dimension to Setaria genomics, resulting in large-scale development of genetic and genomic tools, construction of informative databases, and genome-wide association and functional genomic studies. In this context, this review discusses the advancements made in Setaria genomics, which have generated a considerable knowledge that could be used for the improvement of millets, cereals and biofuel crops. Further, this review also shows the nutritional potential of foxtail millet in providing health benefits to global population and provides a preliminary information on introgressing the nutritional properties in graminaceous species through molecular breeding and transgene-based approaches.
Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology
Latendresse, Mario; Paley, Suzanne M.; Krummenacker, Markus; Ong, Quang D.; Billington, Richard; Kothari, Anamika; Weaver, Daniel; Lee, Thomas; Subhraveti, Pallavi; Spaulding, Aaron; Fulcher, Carol; Keseler, Ingrid M.; Caspi, Ron
2016-01-01
Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms. PMID:26454094
Schmid, Jonas; Zehe, Anja; Vogel, Rudi F.
2016-01-01
As the number of bacterial genomes increases dramatically, the demand for easy to use tools with transparent functionality and comprehensible output for applied comparative genomics grows as well. We present BlAst Diagnostic Gene findEr (BADGE), a tool for the rapid prediction of diagnostic marker genes (DMGs) for the differentiation of bacterial groups (e.g. pathogenic / nonpathogenic). DMG identification settings can be modified easily and installing and running BADGE does not require specific bioinformatics skills. During the BADGE run the user is informed step by step about the DMG finding process, thus making it easy to evaluate the impact of chosen settings and options. On the basis of an example with relevance for beer brewing, being one of the oldest biotechnological processes known, we show a straightforward procedure, from phenotyping, genome sequencing, assembly and annotation, up to a discriminant marker gene PCR assay, making comparative genomics a means to an end. The value and the functionality of BADGE were thoroughly examined, resulting in the successful identification and validation of an outstanding novel DMG (fabZ) for the discrimination of harmless and harmful contaminations of Pediococcus damnosus, which can be applied for spoilage risk determination in breweries. Concomitantly, we present and compare five complete P. damnosus genomes sequenced in this study, finding that the ability to produce the unwanted, spoilage associated off-flavor diacetyl is a plasmid encoded trait in this important beer spoiling species. PMID:27028007
Shimoda, Yoshikazu; Mitsui, Hisayuki; Kamimatsuse, Hiroko; Minamisawa, Kiwamu; Nishiyama, Eri; Ohtsubo, Yoshiyuki; Nagata, Yuji; Tsuda, Masataka; Shinpo, Sayaka; Watanabe, Akiko; Kohara, Mitsuyo; Yamada, Manabu; Nakamura, Yasukazu; Tabata, Satoshi; Sato, Shusei
2008-01-01
Rhizobia are nitrogen-fixing soil bacteria that establish endosymbiosis with some leguminous plants. The completion of several rhizobial genome sequences provides opportunities for genome-wide functional studies of the physiological roles of many rhizobial genes. In order to carry out genome-wide phenotypic screenings, we have constructed a large mutant library of the nitrogen-fixing symbiotic bacterium, Mesorhizobium loti, by transposon mutagenesis. Transposon insertion mutants were generated using the signature-tagged mutagenesis (STM) technique and a total of 29 330 independent mutants were obtained. Along with the collection of transposon mutants, we have determined the transposon insertion sites for 7892 clones, and confirmed insertions in 3680 non-redundant M. loti genes (50.5% of the total number of M. loti genes). Transposon insertions were randomly distributed throughout the M. loti genome without any bias toward G+C contents of insertion target sites and transposon plasmids used for the mutagenesis. We also show the utility of STM mutants by examining the specificity of signature tags and test screenings for growth- and nodulation-deficient mutants. This defined mutant library allows for genome-wide forward- and reverse-genetic functional studies of M. loti and will serve as an invaluable resource for researchers to further our understanding of rhizobial biology. PMID:18658183
Aligning the unalignable: bacteriophage whole genome alignments.
Bérard, Sèverine; Chateau, Annie; Pompidor, Nicolas; Guertin, Paul; Bergeron, Anne; Swenson, Krister M
2016-01-13
In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).
Gramene 2013: comparative plant genomics resources.
Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen
2014-01-01
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Mammalian Synthetic Biology: Engineering Biological Systems.
Black, Joshua B; Perez-Pinera, Pablo; Gersbach, Charles A
2017-06-21
The programming of new functions into mammalian cells has tremendous application in research and medicine. Continued improvements in the capacity to sequence and synthesize DNA have rapidly increased our understanding of mechanisms of gene function and regulation on a genome-wide scale and have expanded the set of genetic components available for programming cell biology. The invention of new research tools, including targetable DNA-binding systems such as CRISPR/Cas9 and sensor-actuator devices that can recognize and respond to diverse chemical, mechanical, and optical inputs, has enabled precise control of complex cellular behaviors at unprecedented spatial and temporal resolution. These tools have been critical for the expansion of synthetic biology techniques from prokaryotic and lower eukaryotic hosts to mammalian systems. Recent progress in the development of genome and epigenome editing tools and in the engineering of designer cells with programmable genetic circuits is expanding approaches to prevent, diagnose, and treat disease and to establish personalized theranostic strategies for next-generation medicines. This review summarizes the development of these enabling technologies and their application to transforming mammalian synthetic biology into a distinct field in research and medicine.
Molecular Neuroanatomy: A Generation of Progress
Pollock, Jonathan D.; Wu, Da-Yu; Satterlee, John
2014-01-01
The neuroscience research landscape has changed dramatically over the past decade. An impressive array of neuroscience tools and technologies have been generated, including brain gene expression atlases, genetically encoded proteins to monitor and manipulate neuronal activity and function, cost effective genome sequencing, new technologies enabling genome manipulation, new imaging methods and new tools for mapping neuronal circuits. However, despite these technological advances, several significant scientific challenges must be overcome in the coming decade to enable a better understanding of brain function and to develop next generation cell type-targeted therapeutics to treat brain disorders. For example, we do not have an inventory of the different types of cells that exist in the brain, nor do we know how to molecularly phenotype them. We also lack robust technologies to map connections between cells. This review will provide an overview of some of the tools and technologies neuroscientists are currently using to move the field of molecular neuroanatomy forward and also discuss emerging technologies that may enable neuroscientists to address these critical scientific challenges over the coming decade. PMID:24388609
CRISPR mediated somatic cell genome engineering in the chicken.
Véron, Nadège; Qu, Zhengdong; Kipen, Phoebe A S; Hirst, Claire E; Marcelle, Christophe
2015-11-01
Gene-targeted knockout technologies are invaluable tools for understanding the functions of genes in vivo. CRISPR/Cas9 system of RNA-guided genome editing is revolutionizing genetics research in a wide spectrum of organisms. Here, we combined CRISPR with in vivo electroporation in the chicken embryo to efficiently target the transcription factor PAX7 in tissues of the developing embryo. This approach generated mosaic genetic mutations within a wild-type cellular background. This series of proof-of-principle experiments indicate that in vivo CRISPR-mediated cell genome engineering is an effective method to achieve gene loss-of-function in the tissues of the chicken embryo and it completes the growing genetic toolbox to study the molecular mechanisms regulating development in this important animal model. Copyright © 2015 Elsevier Inc. All rights reserved.
Image Analysis of DNA Fiber and Nucleus in Plants.
Ohmido, Nobuko; Wako, Toshiyuki; Kato, Seiji; Fukui, Kiichi
2016-01-01
Advances in cytology have led to the application of a wide range of visualization methods in plant genome studies. Image analysis methods are indispensable tools where morphology, density, and color play important roles in the biological systems. Visualization and image analysis methods are useful techniques in the analyses of the detailed structure and function of extended DNA fibers (EDFs) and interphase nuclei. The EDF is the highest in the spatial resolving power to reveal genome structure and it can be used for physical mapping, especially for closely located genes and tandemly repeated sequences. One the other hand, analyzing nuclear DNA and proteins would reveal nuclear structure and functions. In this chapter, we describe the image analysis protocol for quantitatively analyzing different types of plant genome, EDFs and interphase nuclei.
CRISPR-Cas Genome Surgery in Ophthalmology
DiCarlo, James E.; Sengillo, Jesse D.; Justus, Sally; Cabral, Thiago; Tsang, Stephen H.; Mahajan, Vinit B.
2017-01-01
Genetic disease affecting vision can significantly impact patient quality of life. Gene therapy seeks to slow the progression of these diseases by treating the underlying etiology at the level of the genome. Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated systems (Cas) represent powerful tools for studying diseases through the creation of model organisms generated by targeted modification and by the correction of disease mutations for therapeutic purposes. CRISPR-Cas systems have been applied successfully to the visual sciences and study of ophthalmic disease – from the modification of zebrafish and mammalian models of eye development and disease, to the correction of pathogenic mutations in patient-derived stem cells. Recent advances in CRISPR-Cas delivery and optimization boast improved functionality that continues to enhance genome-engineering applications in the eye. This review provides a synopsis of the recent implementations of CRISPR-Cas tools in the field of ophthalmology. PMID:28573077
Recent Advances in Microbial Single Cell Genomics Technology and Applications
NASA Astrophysics Data System (ADS)
Stepanauskas, R.
2016-02-01
Single cell genomics is increasingly utilized as a powerful tool to decipher the metabolic potential, evolutionary histories and in situ interactions of environmental microorganisms. This transformative technology recovers extensive information from cultivation-unbiased samples of individual, unicellular organisms. Thus, it does not require data binning into arbitrary phylogenetic or functional groups and therefore is highly compatible with agent-based modeling approaches. I will present several technological advances in this field, which significantly improve genomic data recovery from individual cells and provide direct linkages between cell's genomic and phenotypic properties. I will also demonstrate how these new technical capabilities help understanding the metabolic potential and viral infections of the "microbial dark matter" inhabiting aquatic and subsurface environments.
The Comprehensive Microbial Resource
Peterson, Jeremy D.; Umayam, Lowell A.; Dickinson, Tanja; Hickey, Erin K.; White, Owen
2001-01-01
One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes. PMID:11125067
Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss.
Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A
2016-03-01
Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.
Tol2 transposon-mediated transgenesis in Xenopus tropicalis.
Hamlet, Michelle R Johnson; Yergeau, Donald A; Kuliyev, Emin; Takeda, Masatoshi; Taira, Masanori; Kawakami, Koichi; Mead, Paul E
2006-09-01
The diploid frog Xenopus tropicalis is becoming a powerful developmental genetic model system. Sequencing of the X. tropicalis genome is nearing completion and several labs are embarking on mutagenesis screens. We are interested in developing insertional mutagenesis strategies in X. tropicalis. Transposon-mediated insertional mutagenesis, once used exclusively in plants and invertebrate systems, is now more widely applicable to vertebrates. The first step in developing transposons as tools for mutagenesis is to demonstrate that these mobile elements function efficiently in the target organism. Here, we show that the Medaka fish transposon, Tol2, is able to stably integrate into the X. tropicalis genome and will serve as a powerful tool for insertional mutagenesis strategies in the frog.
Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier
2016-11-15
In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.
Pandey, Ram Vinay; Kofler, Robert; Orozco-terWengel, Pablo; Nolte, Viola; Schlötterer, Christian
2011-03-02
The enormous potential of natural variation for the functional characterization of genes has been neglected for a long time. Only since recently, functional geneticists are starting to account for natural variation in their analyses. With the new sequencing technologies it has become feasible to collect sequence information for multiple individuals on a genomic scale. In particular sequencing pooled DNA samples has been shown to provide a cost-effective approach for characterizing variation in natural populations. While a range of software tools have been developed for mapping these reads onto a reference genome and extracting SNPs, linking this information to population genetic estimators and functional information still poses a major challenge to many researchers. We developed PoPoolation DB a user-friendly integrated database. Popoolation DB links variation in natural populations with functional information, allowing a wide range of researchers to take advantage of population genetic data. PoPoolation DB provides the user with population genetic parameters (Watterson's θ or Tajima's π), Tajima's D, SNPs, allele frequencies and indels in regions of interest. The database can be queried by gene name, chromosomal position, or a user-provided query sequence or GTF file. We anticipate that PoPoolation DB will be a highly versatile tool for functional geneticists as well as evolutionary biologists. PoPoolation DB, available at http://www.popoolation.at/pgt, provides an integrated platform for researchers to investigate natural polymorphism and associated functional annotations from UCSC and Flybase genome browsers, population genetic estimators and RNA-seq information.
Genomics screens for metastasis genes
Yan, Jinchun; Huang, Qihong
2014-01-01
Metastasis is responsible for most cancer mortality. The process of metastasis is complex, requiring the coordinated expression and fine regulation of many genes in multiple pathways in both the tumor and host tissues. Identification and characterization of the genetic programs that regulate metastasis is critical to understanding the metastatic process and discovering molecular targets for the prevention and treatment of metastasis. Genomic approaches and functional genomic analyses can systemically discover metastasis genes. In this review, we summarize the genetic tools and methods that have been used to identify and characterize the genes that play critical roles in metastasis. PMID:22684367
miRNAtools: Advanced Training Using the miRNA Web of Knowledge.
Stępień, Ewa Ł; Costa, Marina C; Enguita, Francisco J
2018-02-16
Micro-RNAs (miRNAs) are small non-coding RNAs that act as negative regulators of the genomic output. Their intrinsic importance within cell biology and human disease is well known. Their mechanism of action based on the base pairing binding to their cognate targets have helped the development not only of many computer applications for the prediction of miRNA target recognition but also of specific applications for functional assessment and analysis. Learning about miRNA function requires practical training in the use of specific computer and web-based applications that are complementary to wet-lab studies. In order to guide the learning process about miRNAs, we have created miRNAtools (http://mirnatools.eu), a web repository of miRNA tools and tutorials. This article compiles tools with which miRNAs and their regulatory action can be analyzed and that function to collect and organize information dispersed on the web. The miRNAtools website contains a collection of tutorials that can be used by students and tutors engaged in advanced training courses. The tutorials engage in analyses of the functions of selected miRNAs, starting with their nomenclature and genomic localization and finishing with their involvement in specific cellular functions.
Approaches to integrating germline and tumor genomic data in cancer research
Feigelson, Heather Spencer; Goddard, Katrina A.B.; Hollombe, Celine; Tingle, Sharna R.; Gillanders, Elizabeth M.; Mechanic, Leah E.; Nelson, Stefanie A.
2014-01-01
Cancer is characterized by a diversity of genetic and epigenetic alterations occurring in both the germline and somatic (tumor) genomes. Hundreds of germline variants associated with cancer risk have been identified, and large amounts of data identifying mutations in the tumor genome that participate in tumorigenesis have been generated. Increasingly, these two genomes are being explored jointly to better understand how cancer risk alleles contribute to carcinogenesis and whether they influence development of specific tumor types or mutation profiles. To understand how data from germline risk studies and tumor genome profiling is being integrated, we reviewed 160 articles describing research that incorporated data from both genomes, published between January 2009 and December 2012, and summarized the current state of the field. We identified three principle types of research questions being addressed using these data: (i) use of tumor data to determine the putative function of germline risk variants; (ii) identification and analysis of relationships between host genetic background and particular tumor mutations or types; and (iii) use of tumor molecular profiling data to reduce genetic heterogeneity or refine phenotypes for germline association studies. We also found descriptive studies that compared germline and tumor genomic variation in a gene or gene family, and papers describing research methods, data sources, or analytical tools. We identified a large set of tools and data resources that can be used to analyze and integrate data from both genomes. Finally, we discuss opportunities and challenges for cancer research that integrates germline and tumor genomics data. PMID:25115441
Antimicrobial resistance surveillance in the genomic age.
McArthur, Andrew G; Tsang, Kara K
2017-01-01
The loss of effective antimicrobials is reducing our ability to protect the global population from infectious disease. However, the field of antibiotic drug discovery and the public health monitoring of antimicrobial resistance (AMR) is beginning to exploit the power of genome and metagenome sequencing. The creation of novel AMR bioinformatics tools and databases and their continued development will advance our understanding of the molecular mechanisms and threat severity of antibiotic resistance, while simultaneously improving our ability to accurately predict and screen for antibiotic resistance genes within environmental, agricultural, and clinical settings. To do so, efforts must be focused toward exploiting the advancements of genome sequencing and information technology. Currently, AMR bioinformatics software and databases reflect different scopes and functions, each with its own strengths and weaknesses. A review of the available tools reveals common approaches and reference data but also reveals gaps in our curated data, models, algorithms, and data-sharing tools that must be addressed to conquer the limitations and areas of unmet need within the AMR research field before DNA sequencing can be fully exploited for AMR surveillance and improved clinical outcomes. © 2016 New York Academy of Sciences.
VitisExpDB: a database resource for grape functional genomics.
Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L
2008-02-28
The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.
VitisExpDB: A database resource for grape functional genomics
Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L
2008-01-01
Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813
Bacterial CRISPR: Accomplishments and Prospects
Peters, Jason M.; Silvis, Melanie R.; Zhao, Dehua; Hawkins, John S.; Gross, Carol A.; Qi, Lei S.
2015-01-01
In this review we briefly describe the development of CRISPR tools for genome editing and control of transcription in bacteria. We focus on the Type II CRISPR/Cas9 system, provide specific examples for use of the system, and highlight the advantages and disadvantages of CRISPR versus other techniques. We suggest potential strategies for combining CRISPR tools with high-throughput approaches to elucidate gene function in bacteria. PMID:26363124
Navigating yeast genome maintenance with functional genomics.
Measday, Vivien; Stirling, Peter C
2016-03-01
Maintenance of genome integrity is a fundamental requirement of all organisms. To address this, organisms have evolved extremely faithful modes of replication, DNA repair and chromosome segregation to combat the deleterious effects of an unstable genome. Nonetheless, a small amount of genome instability is the driver of evolutionary change and adaptation, and thus a low level of instability is permitted in populations. While defects in genome maintenance almost invariably reduce fitness in the short term, they can create an environment where beneficial mutations are more likely to occur. The importance of this fact is clearest in the development of human cancer, where genome instability is a well-established enabling characteristic of carcinogenesis. This raises the crucial question: what are the cellular pathways that promote genome maintenance and what are their mechanisms? Work in model organisms, in particular the yeast Saccharomyces cerevisiae, has provided the global foundations of genome maintenance mechanisms in eukaryotes. The development of pioneering genomic tools inS. cerevisiae, such as the systematic creation of mutants in all nonessential and essential genes, has enabled whole-genome approaches to identifying genes with roles in genome maintenance. Here, we review the extensive whole-genome approaches taken in yeast, with an emphasis on functional genomic screens, to understand the genetic basis of genome instability, highlighting a range of genetic and cytological screening modalities. By revealing the biological pathways and processes regulating genome integrity, these analyses contribute to the systems-level map of the yeast cell and inform studies of human disease, especially cancer. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data
Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie
2008-01-01
The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055
Choosing a genome browser for a Model Organism Database: surveying the Maize community
Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.
2010-01-01
As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860
The Yak genome database: an integrative database for studying yak biology and high-altitude adaption
2012-01-01
Background The yak (Bos grunniens) is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. Description The Yak Genome Database (YGD) is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. Conclusions A new yak genome database (YGD) has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak. PMID:23134687
Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian
2004-01-01
Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341
Diversity and function of prevalent symbiotic marine bacteria in the genus Endozoicomonas.
Neave, Matthew J; Apprill, Amy; Ferrier-Pagès, Christine; Voolstra, Christian R
2016-10-01
Endozoicomonas bacteria are emerging as extremely diverse and flexible symbionts of numerous marine hosts inhabiting oceans worldwide. Their hosts range from simple invertebrate species, such as sponges and corals, to complex vertebrates, such as fish. Although widely distributed, the functional role of Endozoicomonas within their host microenvironment is not well understood. In this review, we provide a summary of the currently recognized hosts of Endozoicomonas and their global distribution. Next, the potential functional roles of Endozoicomonas, particularly in light of recent microscopic, genomic, and genetic analyses, are discussed. These analyses suggest that Endozoicomonas typically reside in aggregates within host tissues, have a free-living stage due to their large genome sizes, show signs of host and local adaptation, participate in host-associated protein and carbohydrate transport and cycling, and harbour a high degree of genomic plasticity due to the large proportion of transposable elements residing in their genomes. This review will finish with a discussion on the methodological tools currently employed to study Endozoicomonas and host interactions and review future avenues for studying complex host-microbial symbioses.
Comparative analysis and visualization of multiple collinear genomes
2012-01-01
Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897
Identification of cis-suppression of human disease mutations by comparative genomics.
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle; Cassa, Christopher A; Kurtzberg, Joanne; Davis, Erica E; Sunyaev, Shamil R; Katsanis, Nicholas
2015-08-13
Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.
Visualizing conserved gene location across microbe genomes
NASA Astrophysics Data System (ADS)
Shaw, Chris D.
2009-01-01
This paper introduces an analysis-based zoomable visualization technique for displaying the location of genes across many related species of microbes. The purpose of this visualizatiuon is to enable a biologist to examine the layout of genes in the organism of interest with respect to the gene organization of related organisms. During the genomic annotation process, the ability to observe gene organization in common with previously annotated genomes can help a biologist better confirm the structure and function of newly analyzed microbe DNA sequences. We have developed a visualization and analysis tool that enables the biologist to observe and examine gene organization among genomes, in the context of the primary sequence of interest. This paper describes the visualization and analysis steps, and presents a case study using a number of Rickettsia genomes.
From sequencing to annotating: extending the metaphor of the book of life from genetics to genomics.
Hellsten, Iina
2005-12-01
The article discusses how the metaphor of the Book of Life was extended over time to cover the life cycle of the Human Genome Project from genetics to genomics. In particular, the focus is on the role of extendable metaphors in the debate on the Human Genome Project in three European newspapers, popular scientific journals and scientific and scholarly articles from 1990 to 2002. In these different domains of use, various parts of the metaphor were highlighted. The metaphor of Book of Life was mainly used to justify the continuation of the gene research from gene sequencing to comparative genomics. Readily extendable metaphors, such as the Book of Life, function as useful communicative tools both over time and across domains of use.
Variation resources at UC Santa Cruz.
Thomas, Daryl J; Trumbower, Heather; Kern, Andrew D; Rhead, Brooke L; Kuhn, Robert M; Haussler, David; Kent, W James
2007-01-01
The variation resources within the University of California Santa Cruz Genome Browser include polymorphism data drawn from public collections and analyses of these data, along with their display in the context of other genomic annotations. Primary data from dbSNP is included for many organisms, with added information including genomic alleles and orthologous alleles for closely related organisms. Display filtering and coloring is available by variant type, functional class or other annotations. Annotation of potential errors is highlighted and a genomic alignment of the variant's flanking sequence is displayed. HapMap allele frequencies and linkage disequilibrium (LD) are available for each HapMap population, along with non-human primate alleles. The browsing and analysis tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.
CAR: contig assembly of prokaryotic draft genomes using rearrangements.
Lu, Chin Lung; Chen, Kun-Tze; Huang, Shih-Yuan; Chiu, Hsien-Tai
2014-11-28
Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed. In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage. CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.
Visualization of RNA structure models within the Integrative Genomics Viewer.
Busan, Steven; Weeks, Kevin M
2017-07-01
Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
SmedGD 2.0: The Schmidtea mediterranea genome database
Robb, Sofia M.C.; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro
2016-01-01
Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588
Shi, Wuxian; Chance, Mark R.
2010-01-01
About one-third of all proteins are associated with a metal. Metalloproteomics is defined as the structural and functional characterization of metalloproteins on a genome-wide scale. The methodologies utilized in metalloproteomics, including both forward (bottom-up) and reverse (top-down) technologies, to provide information on the identity, quantity and function of metalloproteins are discussed. Important techniques frequently employed in metalloproteomics include classical proteomics tools such as mass spectrometry and 2-D gels, immobilized-metal affinity chromatography, bioinformatics sequence analysis and homology modeling, X-ray absorption spectroscopy and other synchrotron radiation based tools. Combinative applications of these techniques provide a powerful approach to understand the function of metalloproteins. PMID:21130021
Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology.
Karp, Peter D; Latendresse, Mario; Paley, Suzanne M; Krummenacker, Markus; Ong, Quang D; Billington, Richard; Kothari, Anamika; Weaver, Daniel; Lee, Thomas; Subhraveti, Pallavi; Spaulding, Aaron; Fulcher, Carol; Keseler, Ingrid M; Caspi, Ron
2016-09-01
Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Applying Genomic and Genetic Tools to Understand and Mitigate Damage from Exposure to Toxins
2013-10-01
sequences to the human genome . Genome Biol 10, R25 (2009). 26 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand...utilizing the high-throughput technology of mRNA-seq. BODY The goal of our research program (W81XWH-09-1-0715) was to utilize genetic and genomic ...also acquired the achetf222a * * * * * 5 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand and Mitigate
The ENCODE Project at UC Santa Cruz.
Thomas, Daryl J; Rosenbloom, Kate R; Clawson, Hiram; Hinrichs, Angie S; Trumbower, Heather; Raney, Brian J; Karolchik, Donna; Barber, Galt P; Harte, Rachel A; Hillman-Jackson, Jennifer; Kuhn, Robert M; Rhead, Brooke L; Smith, Kayla E; Thakkapallayil, Archana; Zweig, Ann S; Haussler, David; Kent, W James
2007-01-01
The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.
AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments
Zheng, Jie; Stoyanovich, Julia; Manduchi, Elisabetta; Liu, Junmin; Stoeckert, Christian J.
2011-01-01
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute. Database URL: http://www.cbil.upenn.edu/annotCompute/ PMID:22190598
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; van Oven, Mannis; Wallace, Douglas C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F.; Attimonelli, Marcella; Zuchner, Stephan
2016-01-01
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and disease. MSeqDR-LSDB is a locus specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar-compliant variant annotations. PhenoTips is used for phenotypic data submission on de-identified patients using human phenotype ontology terminology. Development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. PMID:26919060
Genovar: a detection and visualization tool for genomic variants.
Jung, Kwang Su; Moon, Sanghoon; Kim, Young Jin; Kim, Bong-Jo; Park, Kiejung
2012-05-08
Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. http://genovar.sourceforge.net/.
Liachko, Ivan; Youngblood, Rachel A.; Keich, Uri; Dunham, Maitreya J.
2013-01-01
DNA replication origins are necessary for the duplication of genomes. In addition, plasmid-based expression systems require DNA replication origins to maintain plasmids efficiently. The yeast autonomously replicating sequence (ARS) assay has been a valuable tool in dissecting replication origin structure and function. However, the dearth of information on origins in diverse yeasts limits the availability of efficient replication origin modules to only a handful of species and restricts our understanding of origin function and evolution. To enable rapid study of origins, we have developed a sequencing-based suite of methods for comprehensively mapping and characterizing ARSs within a yeast genome. Our approach finely maps genomic inserts capable of supporting plasmid replication and uses massively parallel deep mutational scanning to define molecular determinants of ARS function with single-nucleotide resolution. In addition to providing unprecedented detail into origin structure, our data have allowed us to design short, synthetic DNA sequences that retain maximal ARS function. These methods can be readily applied to understand and modulate ARS function in diverse systems. PMID:23241746
ePIANNO: ePIgenomics ANNOtation tool.
Liu, Chia-Hsin; Ho, Bing-Ching; Chen, Chun-Ling; Chang, Ya-Hsuan; Hsu, Yi-Chiung; Li, Yu-Cheng; Yuan, Shin-Sheng; Huang, Yi-Huan; Chang, Chi-Sheng; Li, Ker-Chau; Chen, Hsuan-Yu
2016-01-01
Recently, with the development of next generation sequencing (NGS), the combination of chromatin immunoprecipitation (ChIP) and NGS, namely ChIP-seq, has become a powerful technique to capture potential genomic binding sites of regulatory factors, histone modifications and chromatin accessible regions. For most researchers, additional information including genomic variations on the TF binding site, allele frequency of variation between different populations, variation associated disease, and other neighbour TF binding sites are essential to generate a proper hypothesis or a meaningful conclusion. Many ChIP-seq datasets had been deposited on the public domain to help researchers make new discoveries. However, researches are often intimidated by the complexity of data structure and largeness of data volume. Such information would be more useful if they could be combined or downloaded with ChIP-seq data. To meet such demands, we built a webtool: ePIgenomic ANNOtation tool (ePIANNO, http://epianno.stat.sinica.edu.tw/index.html). ePIANNO is a web server that combines SNP information of populations (1000 Genomes Project) and gene-disease association information of GWAS (NHGRI) with ChIP-seq (hmChIP, ENCODE, and ROADMAP epigenomics) data. ePIANNO has a user-friendly website interface allowing researchers to explore, navigate, and extract data quickly. We use two examples to demonstrate how users could use functions of ePIANNO webserver to explore useful information about TF related genomic variants. Users could use our query functions to search target regions, transcription factors, or annotations. ePIANNO may help users to generate hypothesis or explore potential biological functions for their studies.
CRISPR-Cas9 and CRISPR-Cpf1 mediated targeting of a stomatal developmental gene EPFL9 in rice.
Yin, Xiaojia; Biswal, Akshaya K; Dionora, Jacqueline; Perdigon, Kristel M; Balahadia, Christian P; Mazumdar, Shamik; Chater, Caspar; Lin, Hsiang-Chun; Coe, Robert A; Kretzschmar, Tobias; Gray, Julie E; Quick, Paul W; Bandyopadhyay, Anindya
2017-05-01
CRISPR-Cas9/Cpf1 system with its unique gene targeting efficiency, could be an important tool for functional study of early developmental genes through the generation of successful knockout plants. The introduction and utilization of systems biology approaches have identified several genes that are involved in early development of a plant and with such knowledge a robust tool is required for the functional validation of putative candidate genes thus obtained. The development of the CRISPR-Cas9/Cpf1 genome editing system has provided a convenient tool for creating loss of function mutants for genes of interest. The present study utilized CRISPR/Cas9 and CRISPR-Cpf1 technology to knock out an early developmental gene EPFL9 (Epidermal Patterning Factor like-9, a positive regulator of stomatal development in Arabidopsis) orthologue in rice. Germ-line mutants that were generated showed edits that were carried forward into the T2 generation when Cas9-free homozygous mutants were obtained. The homozygous mutant plants showed more than an eightfold reduction in stomatal density on the abaxial leaf surface of the edited rice plants. Potential off-target analysis showed no significant off-target effects. This study also utilized the CRISPR-LbCpf1 (Lachnospiracae bacterium Cpf1) to target the same OsEPFL9 gene to test the activity of this class-2 CRISPR system in rice and found that Cpf1 is also capable of genome editing and edits get transmitted through generations with similar phenotypic changes seen with CRISPR-Cas9. This study demonstrates the application of CRISPR-Cas9/Cpf1 to precisely target genomic locations and develop transgene-free homozygous heritable gene edits and confirms that the loss of function analysis of the candidate genes emerging from different systems biology based approaches, could be performed, and therefore, this system adds value in the validation of gene function studies.
Bible, Paul W; Kanno, Yuka; Wei, Lai; Brooks, Stephen R; O'Shea, John J; Morasso, Maria I; Loganantharaj, Rasiah; Sun, Hong-Wei
2015-01-01
Comparative co-localization analysis of transcription factors (TFs) and epigenetic marks (EMs) in specific biological contexts is one of the most critical areas of ChIP-Seq data analysis beyond peak calling. Yet there is a significant lack of user-friendly and powerful tools geared towards co-localization analysis based exploratory research. Most tools currently used for co-localization analysis are command line only and require extensive installation procedures and Linux expertise. Online tools partially address the usability issues of command line tools, but slow response times and few customization features make them unsuitable for rapid data-driven interactive exploratory research. We have developed PAPST: Peak Assignment and Profile Search Tool, a user-friendly yet powerful platform with a unique design, which integrates both gene-centric and peak-centric co-localization analysis into a single package. Most of PAPST's functions can be completed in less than five seconds, allowing quick cycles of data-driven hypothesis generation and testing. With PAPST, a researcher with or without computational expertise can perform sophisticated co-localization pattern analysis of multiple TFs and EMs, either against all known genes or a set of genomic regions obtained from public repositories or prior analysis. PAPST is a versatile, efficient, and customizable tool for genome-wide data-driven exploratory research. Creatively used, PAPST can be quickly applied to any genomic data analysis that involves a comparison of two or more sets of genomic coordinate intervals, making it a powerful tool for a wide range of exploratory genomic research. We first present PAPST's general purpose features then apply it to several public ChIP-Seq data sets to demonstrate its rapid execution and potential for cutting-edge research with a case study in enhancer analysis. To our knowledge, PAPST is the first software of its kind to provide efficient and sophisticated post peak-calling ChIP-Seq data analysis as an easy-to-use interactive application. PAPST is available at https://github.com/paulbible/papst and is a public domain work.
Bible, Paul W.; Kanno, Yuka; Wei, Lai; Brooks, Stephen R.; O’Shea, John J.; Morasso, Maria I.; Loganantharaj, Rasiah; Sun, Hong-Wei
2015-01-01
Comparative co-localization analysis of transcription factors (TFs) and epigenetic marks (EMs) in specific biological contexts is one of the most critical areas of ChIP-Seq data analysis beyond peak calling. Yet there is a significant lack of user-friendly and powerful tools geared towards co-localization analysis based exploratory research. Most tools currently used for co-localization analysis are command line only and require extensive installation procedures and Linux expertise. Online tools partially address the usability issues of command line tools, but slow response times and few customization features make them unsuitable for rapid data-driven interactive exploratory research. We have developed PAPST: Peak Assignment and Profile Search Tool, a user-friendly yet powerful platform with a unique design, which integrates both gene-centric and peak-centric co-localization analysis into a single package. Most of PAPST’s functions can be completed in less than five seconds, allowing quick cycles of data-driven hypothesis generation and testing. With PAPST, a researcher with or without computational expertise can perform sophisticated co-localization pattern analysis of multiple TFs and EMs, either against all known genes or a set of genomic regions obtained from public repositories or prior analysis. PAPST is a versatile, efficient, and customizable tool for genome-wide data-driven exploratory research. Creatively used, PAPST can be quickly applied to any genomic data analysis that involves a comparison of two or more sets of genomic coordinate intervals, making it a powerful tool for a wide range of exploratory genomic research. We first present PAPST’s general purpose features then apply it to several public ChIP-Seq data sets to demonstrate its rapid execution and potential for cutting-edge research with a case study in enhancer analysis. To our knowledge, PAPST is the first software of its kind to provide efficient and sophisticated post peak-calling ChIP-Seq data analysis as an easy-to-use interactive application. PAPST is available at https://github.com/paulbible/papst and is a public domain work. PMID:25970601
Solutions for data integration in functional genomics: a critical assessment and case study.
Smedley, Damian; Swertz, Morris A; Wolstencroft, Katy; Proctor, Glenn; Zouberakis, Michael; Bard, Jonathan; Hancock, John M; Schofield, Paul
2008-11-01
The torrent of data emerging from the application of new technologies to functional genomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fragmentation poses serious problems for the model organism community which increasingly rely on data mining and computational approaches that require gathering of data from a range of sources. In the light of these problems, the European Commission has funded a coordination action, CASIMIR (coordination and sustainability of international mouse informatics resources), with a remit to assess the technical and social aspects of database interoperability that currently prevent the full realization of the potential of data integration in mouse functional genomics. In this article, we assess the current problems with interoperability, with particular reference to mouse functional genomics, and critically review the technologies that can be deployed to overcome them. We describe a typical use-case where an investigator wishes to gather data on variation, genomic context and metabolic pathway involvement for genes discovered in a genome-wide screen. We go on to develop an automated approach involving an in silico experimental workflow tool, Taverna, using web services, BioMart and MOLGENIS technologies for data retrieval. Finally, we focus on the current impediments to adopting such an approach in a wider context, and strategies to overcome them.
GenomeD3Plot: a library for rich, interactive visualizations of genomic data in web applications.
Laird, Matthew R; Langille, Morgan G I; Brinkman, Fiona S L
2015-10-15
A simple static image of genomes and associated metadata is very limiting, as researchers expect rich, interactive tools similar to the web applications found in the post-Web 2.0 world. GenomeD3Plot is a light weight visualization library written in javascript using the D3 library. GenomeD3Plot provides a rich API to allow the rapid visualization of complex genomic data using a convenient standards based JSON configuration file. When integrated into existing web services GenomeD3Plot allows researchers to interact with data, dynamically alter the view, or even resize or reposition the visualization in their browser window. In addition GenomeD3Plot has built in functionality to export any resulting genome visualization in PNG or SVG format for easy inclusion in manuscripts or presentations. GenomeD3Plot is being utilized in the recently released Islandviewer 3 (www.pathogenomics.sfu.ca/islandviewer/) to visualize predicted genomic islands with other genome annotation data. However, its features enable it to be more widely applicable for dynamic visualization of genomic data in general. GenomeD3Plot is licensed under the GNU-GPL v3 at https://github.com/brinkmanlab/GenomeD3Plot/. brinkman@sfu.ca. © The Author 2015. Published by Oxford University Press.
NCBI GEO: archive for functional genomics data sets--update.
Barrett, Tanya; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra
2013-01-01
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.
Arneson, Douglas; Bhattacharya, Anindya; Shu, Le; Mäkinen, Ville-Petteri; Yang, Xia
2016-09-09
Human diseases are commonly the result of multidimensional changes at molecular, cellular, and systemic levels. Recent advances in genomic technologies have enabled an outpour of omics datasets that capture these changes. However, separate analyses of these various data only provide fragmented understanding and do not capture the holistic view of disease mechanisms. To meet the urgent needs for tools that effectively integrate multiple types of omics data to derive biological insights, we have developed Mergeomics, a computational pipeline that integrates multidimensional disease association data with functional genomics and molecular networks to retrieve biological pathways, gene networks, and central regulators critical for disease development. To make the Mergeomics pipeline available to a wider research community, we have implemented an online, user-friendly web server ( http://mergeomics. idre.ucla.edu/ ). The web server features a modular implementation of the Mergeomics pipeline with detailed tutorials. Additionally, it provides curated genomic resources including tissue-specific expression quantitative trait loci, ENCODE functional annotations, biological pathways, and molecular networks, and offers interactive visualization of analytical results. Multiple computational tools including Marker Dependency Filtering (MDF), Marker Set Enrichment Analysis (MSEA), Meta-MSEA, and Weighted Key Driver Analysis (wKDA) can be used separately or in flexible combinations. User-defined summary-level genomic association datasets (e.g., genetic, transcriptomic, epigenomic) related to a particular disease or phenotype can be uploaded and computed real-time to yield biologically interpretable results, which can be viewed online and downloaded for later use. Our Mergeomics web server offers researchers flexible and user-friendly tools to facilitate integration of multidimensional data into holistic views of disease mechanisms in the form of tissue-specific key regulators, biological pathways, and gene networks.
Jung, Ki-Hong; Dardick, Christopher; Bartley, Laura E; Cao, Peijian; Phetsom, Jirapa; Canlas, Patrick; Seo, Young-Su; Shultz, Michael; Ouyang, Shu; Yuan, Qiaoping; Frank, Bryan C; Ly, Eugene; Zheng, Li; Jia, Yi; Hsia, An-Ping; An, Kyungsook; Chou, Hui-Hsien; Rocke, David; Lee, Geun Cheol; Schnable, Patrick S; An, Gynheung; Buell, C Robin; Ronald, Pamela C
2008-10-06
Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.
Approaches to Fungal Genome Annotation
Haas, Brian J.; Zeng, Qiandong; Pearson, Matthew D.; Cuomo, Christina A.; Wortman, Jennifer R.
2011-01-01
Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center’s production genome annotation environment. PMID:22059117
MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
2013-01-01
The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.
CRISPR/Cas9 in Genome Editing and Beyond.
Wang, Haifeng; La Russa, Marie; Qi, Lei S
2016-06-02
The Cas9 protein (CRISPR-associated protein 9), derived from type II CRISPR (clustered regularly interspaced short palindromic repeats) bacterial immune systems, is emerging as a powerful tool for engineering the genome in diverse organisms. As an RNA-guided DNA endonuclease, Cas9 can be easily programmed to target new sites by altering its guide RNA sequence, and its development as a tool has made sequence-specific gene editing several magnitudes easier. The nuclease-deactivated form of Cas9 further provides a versatile RNA-guided DNA-targeting platform for regulating and imaging the genome, as well as for rewriting the epigenetic status, all in a sequence-specific manner. With all of these advances, we have just begun to explore the possible applications of Cas9 in biomedical research and therapeutics. In this review, we describe the current models of Cas9 function and the structural and biochemical studies that support it. We focus on the applications of Cas9 for genome editing, regulation, and imaging, discuss other possible applications and some technical considerations, and highlight the many advantages that CRISPR/Cas9 technology offers.
PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.
Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X
2016-01-01
PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.
Using Galaxy to Perform Large-Scale Interactive Data Analyses
Hillman-Jackson, Jennifer; Clements, Dave; Blankenberg, Daniel; Taylor, James; Nekrutenko, Anton
2014-01-01
Innovations in biomedical research technologies continue to provide experimental biologists with novel and increasingly large genomic and high-throughput data resources to be analyzed. As creating and obtaining data has become easier, the key decision faced by many researchers is a practical one: where and how should an analysis be performed? Datasets are large and analysis tool set-up and use is riddled with complexities outside of the scope of core research activities. The authors believe that Galaxy provides a powerful solution that simplifies data acquisition and analysis in an intuitive Web application, granting all researchers access to key informatics tools previously only available to computational specialists working in Unix-based environments. We will demonstrate through a series of biomedically relevant protocols how Galaxy specifically brings together (1) data retrieval from public and private sources, for example, UCSC's Eukaryote and Microbial Genome Browsers, (2) custom tools (wrapped Unix functions, format standardization/conversions, interval operations), and 3rd-party analysis tools. PMID:22700312
Chen, I-Min A.; Markowitz, Victor M.; Palaniappan, Krishna; ...
2016-04-26
Background: The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. Results: Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existingmore » IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. Conclusion: By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, I-Min A.; Markowitz, Victor M.; Palaniappan, Krishna
Background: The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. Results: Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existingmore » IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. Conclusion: By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.« less
Damienikan, Aliaksandr U.
2016-01-01
The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci. PMID:27257541
GreenPhylDB v2.0: comparative and functional genomics in plants.
Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G
2011-01-01
GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.
Complete nucleotide sequence and annotation of the temperate corynephage ϕ16 genome.
Lobanova, Juliya S; Gak, Evgueni R; Andreeva, Irina G; Rybak, Konstantin V; Krylov, Alexander A; Mashko, Sergey V
2017-08-01
The complete genome of ϕ16, a temperate corynephage from Corynebacterium glutamicum ATCC 21792, was sequenced and annotated (GenBank: KY250482). The electron microscopy study of ϕ16 virion confirmed that it belongs to the family Siphoviridae. The ϕ16 genome consists of a linear double-stranded DNA molecule of 58,200 bp (G+C = 52.2%) with protruding cohesive 3'-ends of 14 nt. Four major structural proteins were separated by SDS-PAGE and identified by peptide mass fingerprinting technique. Using bioinformatics analysis, 101 putative ORFs and 5 tRNA genes were predicted. Only 27 putative gene products could be assigned to known biological functions. The ϕ16 genome was divided into functional modules. Seven putative promoters and eight putative unidirectional intrinsic terminators were predicted. One site of putative «-1» programmed ribosomal frameshifting was proposed in the phage tail assembly genome region. C. glutamicum genetic tools could be broadened by exploiting the known integrase gene (gp33) and the newly identified excisionase gene (gp47), participating in site-specific recombination between ϕ16-attP/attB.
GenColors: annotation and comparative genomics of prokaryotes made easy.
Romualdi, Alessandro; Felder, Marius; Rose, Dominic; Gausmann, Ulrike; Schilhabel, Markus; Glöckner, Gernot; Platzer, Matthias; Sühnel, Jürgen
2007-01-01
GenColors (gencolors.fli-leibniz.de) is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. A variety of export/import filters manages an effective data flow from sequence assembly and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard GenBank file(s). The genome comparison tools include best bidirectional hits, gene conservation, syntenies, and gene core sets. Precomputed UniProt matches allow annotation and analysis in an effective manner. In addition to these analysis options, base-specific quality data (coverage and confidence) can also be handled if available. The GenColors system can be used both for annotation purposes in ongoing genome projects and as an analysis tool for finished genomes. GenColors comes in two types, as dedicated genome browsers and as the Jena Prokaryotic Genome Viewer (JPGV). Dedicated genome browsers contain genomic information on a set of related genomes and offer a large number of options for genome comparison. The system has been efficiently used in the genomic sequencing of Borrelia garinii and is currently applied to various ongoing genome projects on Borrelia, Legionella, Escherichia, and Pseudomonas genomes. One of these dedicated browsers, the Spirochetes Genome Browser (sgb.fli-leibniz.de) with Borrelia, Leptospira, and Treponema genomes, is freely accessible. The others will be released after finalization of the corresponding genome projects. JPGV (jpgv.fli-leibniz.de) offers information on almost all finished bacterial genomes, as compared to the dedicated browsers with reduced genome comparison functionality, however. As of January 2006, this viewer includes 632 genomic elements (e.g., chromosomes and plasmids) of 293 species. The system provides versatile quick and advanced search options for all currently known prokaryotic genomes and generates circular and linear genome plots. Gene information sheets contain basic gene information, database search options, and links to external databases. GenColors is also available on request for local installation.
Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A
2013-09-02
In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome-wide collection of reference RNA motif regulons is available in the RegPrecise database (http://regprecise.lbl.gov/).
Evaluation method for the potential functionome harbored in the genome and metagenome.
Takami, Hideto; Taniguchi, Takeaki; Moriya, Yuki; Kuwahara, Tomomi; Kanehisa, Minoru; Goto, Susumu
2012-12-12
One of the main goals of genomic analysis is to elucidate the comprehensive functions (functionome) in individual organisms or a whole community in various environments. However, a standard evaluation method for discerning the functional potentials harbored within the genome or metagenome has not yet been established. We have developed a new evaluation method for the potential functionome, based on the completion ratio of Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules. Distribution of the completion ratio of the KEGG functional modules in 768 prokaryotic species varied greatly with the kind of module, and all modules primarily fell into 4 patterns (universal, restricted, diversified and non-prokaryotic modules), indicating the universal and unique nature of each module, and also the versatility of the KEGG Orthology (KO) identifiers mapped to each one. The module completion ratio in 8 phenotypically different bacilli revealed that some modules were shared only in phenotypically similar species. Metagenomes of human gut microbiomes from 13 healthy individuals previously determined by the Sanger method were analyzed based on the module completion ratio. Results led to new discoveries in the nutritional preferences of gut microbes, believed to be one of the mutualistic representations of gut microbiomes to avoid nutritional competition with the host. The method developed in this study could characterize the functionome harbored in genomes and metagenomes. As this method also provided taxonomical information from KEGG modules as well as the gene hosts constructing the modules, interpretation of completion profiles was simplified and we could identify the complementarity between biochemical functions in human hosts and the nutritional preferences in human gut microbiomes. Thus, our method has the potential to be a powerful tool for comparative functional analysis in genomics and metagenomics, able to target unknown environments containing various uncultivable microbes within unidentified phyla.
Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R
2017-07-05
Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.
High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome
2013-01-01
Background Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate. Results Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny. Conclusions We suggest that the genome of S. arboricolus will be useful in future comparative genomics analysis of the Saccharomyces sensu stricto yeasts. PMID:23368932
Rueda, Manuel; Torkamani, Ali
2017-08-18
Whole genome and exome sequencing usually include reads containing mitochondrial DNA (mtDNA). Yet, state-of-the-art pipelines and services for human nuclear genome variant calling and annotation do not handle mitochondrial genome data appropriately. As a consequence, any researcher desiring to add mtDNA variant analysis to their investigations is forced to explore the literature for mtDNA pipelines, evaluate them, and implement their own instance of the desired tool. This task is far from trivial, and can be prohibitive for non-bioinformaticians. We have developed SG-ADVISER mtDNA, a web server to facilitate the analysis and interpretation of mtDNA genomic data coming from next generation sequencing (NGS) experiments. The server was built in the context of our SG-ADVISER framework and on top of the MtoolBox platform (Calabrese et al., Bioinformatics 30(21):3115-3117, 2014), and includes most of its functionalities (i.e., assembly of mitochondrial genomes, heteroplasmic fractions, haplogroup assignment, functional and prioritization analysis of mitochondrial variants) as well as a back-end and a front-end interface. The server has been tested with unpublished data from 200 individuals of a healthy aging cohort (Erikson et al., Cell 165(4):1002-1011, 2016) and their data is made publicly available here along with a preliminary analysis of the variants. We observed that individuals over ~90 years old carried low levels of heteroplasmic variants in their genomes. SG-ADVISER mtDNA is a fast and functional tool that allows for variant calling and annotation of human mtDNA data coming from NGS experiments. The server was built with simplicity in mind, and builds on our own experience in interpreting mtDNA variants in the context of sudden death and rare diseases. Our objective is to provide an interface for non-bioinformaticians aiming to acquire (or contrast) mtDNA annotations via MToolBox. SG-ADVISER web server is freely available to all users at https://genomics.scripps.edu/mtdna .
A domain-centric solution to functional genomics via dcGO Predictor
2013-01-01
Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. PMID:23514627
Argout, X; Martin, G; Droc, G; Fouet, O; Labadie, K; Rivals, E; Aury, J M; Lanaud, C
2017-09-15
Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes. We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence. Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).
SHAO, Ming; XU, Tian-Rui; CHEN, Ce-Shi
2016-01-01
Targeted genome editing technology has been widely used in biomedical studies. The CRISPR-associated RNA-guided endonuclease Cas9 has become a versatile genome editing tool. The CRISPR/Cas9 system is useful for studying gene function through efficient knock-out, knock-in or chromatin modification of the targeted gene loci in various cell types and organisms. It can be applied in a number of fields, such as genetic breeding, disease treatment and gene functional investigation. In this review, we introduce the most recent developments and applications, the challenges, and future directions of Cas9 in generating disease animal model. Derived from the CRISPR adaptive immune system of bacteria, the development trend of Cas9 will inevitably fuel the vital applications from basic research to biotechnology and biomedicine. PMID:27469250
Shao, Ming; Xu, Tian-Rui; Chen, Ce-Shi
2016-07-18
Targeted genome editing technology has been widely used in biomedical studies. The CRISPR-associated RNA-guided endonuclease Cas9 has become a versatile genome editing tool. The CRISPR/Cas9 system is useful for studying gene function through efficient knock-out, knock-in or chromatin modification of the targeted gene loci in various cell types and organisms. It can be applied in a number of fields, such as genetic breeding, disease treatment and gene functional investigation. In this review, we introduce the most recent developments and applications, the challenges, and future directions of Cas9 in generating disease animal model. Derived from the CRISPR adaptive immune system of bacteria, the development trend of Cas9 will inevitably fuel the vital applications from basic research to biotechnology and bio-medicine.
YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.
Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh
2015-01-16
Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the preliminary results showed differences in virulence genes found in Yersinia pestis and Yersinia pseudotuberculosis compared to other Yersinia species, and differences between Yersinia enterocolitica subsp. enterocolitica and Yersinia enterocolitica subsp. palearctica. YersiniaBase offers free access to wide range of genomic data and analysis tools for the analysis of Yersinia. YersiniaBase can be accessed at http://yersinia.um.edu.my .
Genome Editing for the Study of Cardiovascular Diseases.
Chadwick, Alexandra C; Musunuru, Kiran
2017-03-01
The opportunities afforded through the recent advent of genome-editing technologies have allowed investigators to more easily study a number of diseases. The advantages and limitations of the most prominent genome-editing technologies are described in this review, along with potential applications specifically focused on cardiovascular diseases. The recent genome-editing tools using programmable nucleases, such as zinc-finger nucleases, transcription activator-like effector nucleases, and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9), have rapidly been adapted to manipulate genes in a variety of cellular and animal models. A number of recent cardiovascular disease-related publications report cases in which specific mutations are introduced into disease models for functional characterization and for testing of therapeutic strategies. Recent advances in genome-editing technologies offer new approaches to understand and treat diseases. Here, we discuss genome editing strategies to easily characterize naturally occurring mutations and offer strategies with potential clinical relevance.
GPU Accelerated Browser for Neuroimaging Genomics.
Zigon, Bob; Li, Huang; Yao, Xiaohui; Fang, Shiaofen; Hasan, Mohammad Al; Yan, Jingwen; Moore, Jason H; Saykin, Andrew J; Shen, Li
2018-04-25
Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration.
Analyzing and interpreting genome data at the network level with ConsensusPathDB.
Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas
2016-10-01
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.
blastjs: a BLAST+ wrapper for Node.js.
Page, Martin; MacLean, Dan; Schudoma, Christian
2016-02-27
To cope with the ever-increasing amount of sequence data generated in the field of genomics, the demand for efficient and fast database searches that drive functional and structural annotation in both large- and small-scale genome projects is on the rise. The tools of the BLAST+ suite are the most widely employed bioinformatic method for these database searches. Recent trends in bioinformatics application development show an increasing number of JavaScript apps that are based on modern frameworks such as Node.js. Until now, there is no way of using database searches with the BLAST+ suite from a Node.js codebase. We developed blastjs, a Node.js library that wraps the search tools of the BLAST+ suite and thus allows to easily add significant functionality to any Node.js-based application. blastjs is a library that allows the incorporation of BLAST+ functionality into bioinformatics applications based on JavaScript and Node.js. The library was designed to be as user-friendly as possible and therefore requires only a minimal amount of code in the client application. The library is freely available under the MIT license at https://github.com/teammaclean/blastjs.
A call for benchmarking transposable element annotation methods.
Hoen, Douglas R; Hickey, Glenn; Bourque, Guillaume; Casacuberta, Josep; Cordaux, Richard; Feschotte, Cédric; Fiston-Lavier, Anna-Sophie; Hua-Van, Aurélie; Hubley, Robert; Kapusta, Aurélie; Lerat, Emmanuelle; Maumus, Florian; Pollock, David D; Quesneville, Hadi; Smit, Arian; Wheeler, Travis J; Bureau, Thomas E; Blanchette, Mathieu
2015-01-01
DNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success.
Li, Lixin; Piatek, Marek J; Atef, Ahmed; Piatek, Agnieszka; Wibowo, Anjar; Fang, Xiaoyun; Sabir, J S M; Zhu, Jian-Kang; Mahfouz, Magdy M
2012-03-01
Transcription activator-like effectors (TALEs) can be used as DNA-targeting modules by engineering their repeat domains to dictate user-selected sequence specificity. TALEs have been shown to function as site-specific transcriptional activators in a variety of cell types and organisms. TALE nucleases (TALENs), generated by fusing the FokI cleavage domain to TALE, have been used to create genomic double-strand breaks. The identity of the TALE repeat variable di-residues, their number, and their order dictate the DNA sequence specificity. Because TALE repeats are nearly identical, their assembly by cloning or even by synthesis is challenging and time consuming. Here, we report the development and use of a rapid and straightforward approach for the construction of designer TALE (dTALE) activators and nucleases with user-selected DNA target specificity. Using our plasmid set of 100 repeat modules, researchers can assemble repeat domains for any 14-nucleotide target sequence in one sequential restriction-ligation cloning step and in only 24 h. We generated several custom dTALEs and dTALENs with new target sequence specificities and validated their function by transient expression in tobacco leaves and in vitro DNA cleavage assays, respectively. Moreover, we developed a web tool, called idTALE, to facilitate the design of dTALENs and the identification of their genomic targets and potential off-targets in the genomes of several model species. Our dTALE repeat assembly approach along with the web tool idTALE will expedite genome-engineering applications in a variety of cell types and organisms including plants.
Yeast Genomics for Bread, Beer, Biology, Bucks and Breath
NASA Astrophysics Data System (ADS)
Sakharkar, Kishore R.; Sakharkar, Meena K.
The rapid advances and scale up of projects in DNA sequencing dur ing the past two decades have produced complete genome sequences of several eukaryotic species. The versatile genetic malleability of the yeast, and the high degree of conservation between its cellular processes and those of human cells have made it a model of choice for pioneering research in molecular and cell biology. The complete sequence of yeast genome has proven to be extremely useful as a reference towards the sequences of human and for providing systems to explore key gene functions. Yeast has been a ‘legendary model’ for new technologies and gaining new biological insights into basic biological sciences and biotechnology. This chapter describes the awesome power of yeast genetics, genomics and proteomics in understanding of biological function. The applications of yeast as a screening tool to the field of drug discovery and development are highlighted and the traditional importance of yeast for bakers and brewers is discussed.
Computational Analysis of Uncharacterized Proteins of Environmental Bacterial Genome
NASA Astrophysics Data System (ADS)
Coxe, K. J.; Kumar, M.
2017-12-01
Betaproteobacteria strain CB is a gram-negative bacterium in the phylum Proteobacteria and are found naturally in soil and water. In this complex environment, bacteria play a key role in efficiently eliminating the organic material and other pollutants from wastewater. To investigate the process of pollutant removal from wastewater using bacteria, it is important to characterize the proteins encoded by the bacterial genome. Our study combines a number of bioinformatics tools to predict the function of unassigned proteins in the bacterial genome. The genome of Betaproteobacteria strain CB contains 2,112 proteins in which function of 508 proteins are unknown, termed as uncharacterized proteins (UPs). The localization of the UPs with in the cell was determined and the structure of 38 UPs was accurately predicted. These UPs were predicted to belong to various classes of proteins such as enzymes, transporters, binding proteins, signal peptides, transmembrane proteins and other proteins. The outcome of this work will help better understand wastewater treatment mechanism.
Passport, a native Tc1 transposon from flatfish, is functionally active in vertebrate cells
Clark, Karl J.; Carlson, Daniel F.; Leaver, Michael J.; Foster, Linda K.; Fahrenkrug, Scott C.
2009-01-01
The Tc1/mariner family of DNA transposons is widespread across fungal, plant and animal kingdoms, and thought to contribute to the evolution of their host genomes. To date, an active Tc1 transposon has not been identified within the native genome of a vertebrate. We demonstrate that Passport, a native transposon isolated from a fish (Pleuronectes platessa), is active in a variety of vertebrate cells. In transposition assays, we found that the Passport transposon system improved stable cellular transgenesis by 40-fold, has an apparent preference for insertion into genes, and is subject to overproduction inhibition like other Tc1 elements. Passport represents the first vertebrate Tc1 element described as both natively intact and functionally active, and given its restricted phylogenetic distribution, may be contemporaneously active. The Passport transposon system thus complements the available genetic tools for the manipulation of vertebrate genomes, and may provide a unique system for studying the infiltration of vertebrate genomes by Tc1 elements. PMID:19136468
Passport, a native Tc1 transposon from flatfish, is functionally active in vertebrate cells.
Clark, Karl J; Carlson, Daniel F; Leaver, Michael J; Foster, Linda K; Fahrenkrug, Scott C
2009-03-01
The Tc1/mariner family of DNA transposons is widespread across fungal, plant and animal kingdoms, and thought to contribute to the evolution of their host genomes. To date, an active Tc1 transposon has not been identified within the native genome of a vertebrate. We demonstrate that Passport, a native transposon isolated from a fish (Pleuronectes platessa), is active in a variety of vertebrate cells. In transposition assays, we found that the Passport transposon system improved stable cellular transgenesis by 40-fold, has an apparent preference for insertion into genes, and is subject to overproduction inhibition like other Tc1 elements. Passport represents the first vertebrate Tc1 element described as both natively intact and functionally active, and given its restricted phylogenetic distribution, may be contemporaneously active. The Passport transposon system thus complements the available genetic tools for the manipulation of vertebrate genomes, and may provide a unique system for studying the infiltration of vertebrate genomes by Tc1 elements.
High-resolution DNA melting analysis in plant research
USDA-ARS?s Scientific Manuscript database
Genetic and genomic studies provide valuable insight into the inheritance, structure, organization, and function of genes. The knowledge gained from the analysis of plant genes is beneficial to all aspects of plant research, including crop improvement. New methods and tools are continually developed...
The Importance of Normalization on Large and Heterogeneous Microarray Datasets
DNA microarray technology is a powerful functional genomics tool increasingly used for investigating global gene expression in environmental studies. Microarrays can also be used in identifying biological networks, as they give insight on the complex gene-to-gene interactions, ne...
Xu, Zheng; Zhang, Guosheng; Duan, Qing; Chai, Shengjie; Zhang, Baqun; Wu, Cong; Jin, Fulai; Yue, Feng; Li, Yun; Hu, Ming
2016-03-11
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements and their potential target genes. Leveraging such information revealed by Hi-C holds the promise of elucidating the functions of genetic variants in human diseases. In this work, we present HiView, the first integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants. HiView is able to display Hi-C data and statistical evidence for chromatin interactions in genomic regions surrounding any given GWAS variant, enabling straightforward visualization and interpretation. We believe that as the first GWAS variants-centered Hi-C genome browser, HiView is a useful tool guiding post-GWAS functional genomics studies. HiView is freely accessible at: http://www.unc.edu/~yunmli/HiView .
Combining Induced Pluripotent Stem Cells and Genome Editing Technologies for Clinical Applications.
Chang, Chia-Yu; Ting, Hsiao-Chien; Su, Hong-Lin; Jeng, Jing-Ren
2018-01-01
In this review, we introduce current developments in induced pluripotent stem cells (iPSCs), site-specific nuclease (SSN)-mediated genome editing tools, and the combined application of these two novel technologies in biomedical research and therapeutic trials. The sustainable pluripotent property of iPSCs in vitro not only provides unlimited cell sources for basic research but also benefits precision medicines for human diseases. In addition, rapidly evolving SSN tools efficiently tailor genetic manipulations for exploring gene functions and can be utilized to correct genetic defects of congenital diseases in the near future. Combining iPSC and SSN technologies will create new reliable human disease models with isogenic backgrounds in vitro and provide new solutions for cell replacement and precise therapies.
Perspectives: Gene Expression in Fisheries Management
Nielsen, Jennifer L.; Pavey, Scott A.
2010-01-01
Functional genes and gene expression have been connected to physiological traits linked to effective production and broodstock selection in aquaculture, selective implications of commercial fish harvest, and adaptive changes reflected in non-commercial fish populations subject to human disturbance and climate change. Gene mapping using single nucleotide polymorphisms (SNPs) to identify functional genes, gene expression (analogue microarrays and real-time PCR), and digital sequencing technologies looking at RNA transcripts present new concepts and opportunities in support of effective and sustainable fisheries. Genomic tools have been rapidly growing in aquaculture research addressing aspects of fish health, toxicology, and early development. Genomic technologies linking effects in functional genes involved in growth, maturation and life history development have been tied to selection resulting from harvest practices. Incorporating new and ever-increasing knowledge of fish genomes is opening a different perspective on local adaptation that will prove invaluable in wild fish conservation and management. Conservation of fish stocks is rapidly incorporating research on critical adaptive responses directed at the effects of human disturbance and climate change through gene expression studies. Genomic studies of fish populations can be generally grouped into three broad categories: 1) evolutionary genomics and biodiversity; 2) adaptive physiological responses to a changing environment; and 3) adaptive behavioral genomics and life history diversity. We review current genomic research in fisheries focusing on those that use microarrays to explore differences in gene expression among phenotypes and within or across populations, information that is critically important to the conservation of fish and their relationship to humans.
Smith, Andrew J P; Deloukas, Panos; Munroe, Patricia B
2018-04-13
Over the last decade, genome-wide association studies (GWAS) have propelled the discovery of thousands of loci associated with complex diseases. The focus is now turning towards the function of these association signals, determining the causal variant(s) amongst those in strong linkage disequilibrium, and identifying their underlying mechanisms, such as long-range gene regulation. Genome-editing techniques utilising zinc-finger nucleases (ZFN), transcription activator-like effector nucleases (TALENs) and clustered regularly-interspaced short palindromic repeats with Cas9 nuclease (CRISPR-Cas9), are becoming the tools of choice to establish functionality for these variants, due to the ability to assess effects of single variants in vivo. This review will discuss examples of how these technologies have begun to aid functional analysis of GWAS loci for complex traits such as cardiovascular disease, type 2 diabetes, cancer, obesity and autoimmune disease. We focus on analysis of variants occurring within non-coding genomic regions, as these comprise the majority of GWAS variants, providing the greatest challenges to determining functionality, and compare editing strategies that provide different levels of evidence for variant functionality. The review describes molecular insights into some of these potentially causal variants, and how these may relate to the pathology of the trait, and look towards future directions for these technologies in post-GWAS analysis, such as base-editing.
[Tale nucleases--new tool for genome editing].
Glazkova, D V; Shipulin, G A
2014-01-01
The ability to introduce targeted changes in the genome of living cells or entire organisms enables researchers to meet the challenges of basic life sciences, biotechnology and medicine. Knockdown of target genes in the zygotes gives the opportunity to investigate the functions of these genes in different organisms. Replacement of single nucleotide in the DNA sequence allows to correct mutations in genes and thus to cure hereditary diseases. Adding transgene to specific genomic.loci can be used in biotechnology for generation of organisms with certain properties or cell lines for biopharmaceutical production. Such manipulations of gene sequences in their natural chromosomal context became possible after the emergence of the technology called "genome editing". This technology is based on the induction of a double-strand break in a specific genomic target DNA using endonucleases that recognize the unique sequences in the genome and on subsequent recovery of DNA integrity through the use of cellular repair mechanisms. A necessary tool for the genome editing is a custom-designed endonuclease which is able to recognize selected sequences. The emergence of a new type of programmable endonucleases, which were constructed on the basis of bacterial proteins--TAL-effectors (Transcription activators like effector), has become an important stage in the development of technology and promoted wide spread of the genome editing. This article reviews the history of the discovery of TAL effectors and creation of TALE nucleases, and describes their advantages over zinc finger endonucleases that appeared earlier. A large section is devoted to description of genetic modifications that can be performed using the genome editing.
Improved Genome Assembly and Annotation for the Rock Pigeon (Columba livia)
Holt, Carson; Campbell, Michael; Keays, David A.; Edelman, Nathaniel; Kapusta, Aurélie; Maclary, Emily; T. Domyan, Eric; Suh, Alexander; Warren, Wesley C.; Yandell, Mark; Gilbert, M. Thomas P.; Shapiro, Michael D.
2018-01-01
The domestic rock pigeon (Columba livia) is among the most widely distributed and phenotypically diverse avian species. C. livia is broadly studied in ecology, genetics, physiology, behavior, and evolutionary biology, and has recently emerged as a model for understanding the molecular basis of anatomical diversity, the magnetic sense, and other key aspects of avian biology. Here we report an update to the C. livia genome reference assembly and gene annotation dataset. Greatly increased scaffold lengths in the updated reference assembly, along with an updated annotation set, provide improved tools for evolutionary and functional genetic studies of the pigeon, and for comparative avian genomics in general. PMID:29519939
The genetic engineering system, clustered regularly interspaced short palindromic repeats (CRISPR), has conventionally been used to inactivate genes by making targeted double stranded cuts in DNA. While CRISPR is a useful tool, it can only be used to create loss-of-function modifications and often causes off-target effects due to the disruptive mechanism by which it works. CTD2 researchers at the University of California, San Francisco recently addressed these shortcomings in a publication in Cell.
WormBase ParaSite - a comprehensive resource for helminth genomics.
Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew
2017-07-01
The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Rapid diversification of five Oryza AA genomes associated with rice adaptation.
Zhang, Qun-Jie; Zhu, Ting; Xia, En-Hua; Shi, Chao; Liu, Yun-Long; Zhang, Yun; Liu, Yuan; Jiang, Wen-Kai; Zhao, You-Jie; Mao, Shu-Yan; Zhang, Li-Ping; Huang, Hui; Jiao, Jun-Ying; Xu, Ping-Zhen; Yao, Qiu-Yang; Zeng, Fan-Chun; Yang, Li-Li; Gao, Ju; Tao, Da-Yun; Wang, Yue-Ju; Bennetzen, Jeffrey L; Gao, Li-Zhi
2014-11-18
Comparative genomic analyses among closely related species can greatly enhance our understanding of plant gene and genome evolution. We report de novo-assembled AA-genome sequences for Oryza nivara, Oryza glaberrima, Oryza barthii, Oryza glumaepatula, and Oryza meridionalis. Our analyses reveal massive levels of genomic structural variation, including segmental duplication and rapid gene family turnover, with particularly high instability in defense-related genes. We show, on a genomic scale, how lineage-specific expansion or contraction of gene families has led to their morphological and reproductive diversification, thus enlightening the evolutionary process of speciation and adaptation. Despite strong purifying selective pressures on most Oryza genes, we documented a large number of positively selected genes, especially those genes involved in flower development, reproduction, and resistance-related processes. These diversifying genes are expected to have played key roles in adaptations to their ecological niches in Asia, South America, Africa and Australia. Extensive variation in noncoding RNA gene numbers, function enrichment, and rates of sequence divergence might also help account for the different genetic adaptations of these rice species. Collectively, these resources provide new opportunities for evolutionary genomics, numerous insights into recent speciation, a valuable database of functional variation for crop improvement, and tools for efficient conservation of wild rice germplasm.
Rapid diversification of five Oryza AA genomes associated with rice adaptation
Zhang, Qun-Jie; Zhu, Ting; Xia, En-Hua; Shi, Chao; Liu, Yun-Long; Zhang, Yun; Liu, Yuan; Jiang, Wen-Kai; Zhao, You-Jie; Mao, Shu-Yan; Zhang, Li-Ping; Huang, Hui; Jiao, Jun-Ying; Xu, Ping-Zhen; Yao, Qiu-Yang; Zeng, Fan-Chun; Yang, Li-Li; Gao, Ju; Tao, Da-Yun; Wang, Yue-Ju; Bennetzen, Jeffrey L.; Gao, Li-Zhi
2014-01-01
Comparative genomic analyses among closely related species can greatly enhance our understanding of plant gene and genome evolution. We report de novo-assembled AA-genome sequences for Oryza nivara, Oryza glaberrima, Oryza barthii, Oryza glumaepatula, and Oryza meridionalis. Our analyses reveal massive levels of genomic structural variation, including segmental duplication and rapid gene family turnover, with particularly high instability in defense-related genes. We show, on a genomic scale, how lineage-specific expansion or contraction of gene families has led to their morphological and reproductive diversification, thus enlightening the evolutionary process of speciation and adaptation. Despite strong purifying selective pressures on most Oryza genes, we documented a large number of positively selected genes, especially those genes involved in flower development, reproduction, and resistance-related processes. These diversifying genes are expected to have played key roles in adaptations to their ecological niches in Asia, South America, Africa and Australia. Extensive variation in noncoding RNA gene numbers, function enrichment, and rates of sequence divergence might also help account for the different genetic adaptations of these rice species. Collectively, these resources provide new opportunities for evolutionary genomics, numerous insights into recent speciation, a valuable database of functional variation for crop improvement, and tools for efficient conservation of wild rice germplasm. PMID:25368197
Deshmukh, Rupesh K; Sonah, Humira; Bélanger, Richard R
2016-01-01
Aquaporins (AQPs) are channel-forming integral membrane proteins that facilitate the movement of water and many other small molecules. Compared to animals, plants contain a much higher number of AQPs in their genome. Homology-based identification of AQPs in sequenced species is feasible because of the high level of conservation of protein sequences across plant species. Genome-wide characterization of AQPs has highlighted several important aspects such as distribution, genetic organization, evolution and conserved features governing solute specificity. From a functional point of view, the understanding of AQP transport system has expanded rapidly with the help of transcriptomics and proteomics data. The efficient analysis of enormous amounts of data generated through omic scale studies has been facilitated through computational advancements. Prediction of protein tertiary structures, pore architecture, cavities, phosphorylation sites, heterodimerization, and co-expression networks has become more sophisticated and accurate with increasing computational tools and pipelines. However, the effectiveness of computational approaches is based on the understanding of physiological and biochemical properties, transport kinetics, solute specificity, molecular interactions, sequence variations, phylogeny and evolution of aquaporins. For this purpose, tools like Xenopus oocyte assays, yeast expression systems, artificial proteoliposomes, and lipid membranes have been efficiently exploited to study the many facets that influence solute transport by AQPs. In the present review, we discuss genome-wide identification of AQPs in plants in relation with recent advancements in analytical tools, and their availability and technological challenges as they apply to AQPs. An exhaustive review of omics resources available for AQP research is also provided in order to optimize their efficient utilization. Finally, a detailed catalog of computational tools and analytical pipelines is offered as a resource for AQP research.
Deshmukh, Rupesh K.; Sonah, Humira; Bélanger, Richard R.
2016-01-01
Aquaporins (AQPs) are channel-forming integral membrane proteins that facilitate the movement of water and many other small molecules. Compared to animals, plants contain a much higher number of AQPs in their genome. Homology-based identification of AQPs in sequenced species is feasible because of the high level of conservation of protein sequences across plant species. Genome-wide characterization of AQPs has highlighted several important aspects such as distribution, genetic organization, evolution and conserved features governing solute specificity. From a functional point of view, the understanding of AQP transport system has expanded rapidly with the help of transcriptomics and proteomics data. The efficient analysis of enormous amounts of data generated through omic scale studies has been facilitated through computational advancements. Prediction of protein tertiary structures, pore architecture, cavities, phosphorylation sites, heterodimerization, and co-expression networks has become more sophisticated and accurate with increasing computational tools and pipelines. However, the effectiveness of computational approaches is based on the understanding of physiological and biochemical properties, transport kinetics, solute specificity, molecular interactions, sequence variations, phylogeny and evolution of aquaporins. For this purpose, tools like Xenopus oocyte assays, yeast expression systems, artificial proteoliposomes, and lipid membranes have been efficiently exploited to study the many facets that influence solute transport by AQPs. In the present review, we discuss genome-wide identification of AQPs in plants in relation with recent advancements in analytical tools, and their availability and technological challenges as they apply to AQPs. An exhaustive review of omics resources available for AQP research is also provided in order to optimize their efficient utilization. Finally, a detailed catalog of computational tools and analytical pipelines is offered as a resource for AQP research. PMID:28066459
2011-01-01
Background Understanding the genetic elements that contribute to key aspects of coffee biology will have an impact on future agronomical improvements for this economically important tree. During the past years, EST collections were generated in Coffee, opening the possibility to create new tools for functional genomics. Results The "PUCE CAFE" Project, organized by the scientific consortium NESTLE/IRD/CIRAD, has developed an oligo-based microarray using 15,721 unigenes derived from published coffee EST sequences mostly obtained from different stages of fruit development and leaves in Coffea Canephora (Robusta). Hybridizations for two independent experiments served to compare global gene expression profiles in three types of tissue matter (mature beans, leaves and flowers) in C. canephora as well as in the leaves of three different coffee species (C. canephora, C. eugenoides and C. arabica). Microarray construction, statistical analyses and validation by Q-PCR analysis are presented in this study. Conclusion We have generated the first 15 K coffee array during this PUCE CAFE project, granted by Génoplante (the French consortium for plant genomics). This new tool will help study functional genomics in a wide range of experiments on various plant tissues, such as analyzing bean maturation or resistance to pathogens or drought. Furthermore, the use of this array has proven to be valid in different coffee species (diploid or tetraploid), drastically enlarging its impact for high-throughput gene expression in the community of coffee research. PMID:21208403
Purdue ionomics information management system. An integrated functional genomics platform.
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E
2007-02-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
Privat, Isabelle; Bardil, Amélie; Gomez, Aureliano Bombarely; Severac, Dany; Dantec, Christelle; Fuentes, Ivanna; Mueller, Lukas; Joët, Thierry; Pot, David; Foucrier, Séverine; Dussert, Stéphane; Leroy, Thierry; Journot, Laurent; de Kochko, Alexandre; Campa, Claudine; Combes, Marie-Christine; Lashermes, Philippe; Bertrand, Benoit
2011-01-05
Understanding the genetic elements that contribute to key aspects of coffee biology will have an impact on future agronomical improvements for this economically important tree. During the past years, EST collections were generated in Coffee, opening the possibility to create new tools for functional genomics. The "PUCE CAFE" Project, organized by the scientific consortium NESTLE/IRD/CIRAD, has developed an oligo-based microarray using 15,721 unigenes derived from published coffee EST sequences mostly obtained from different stages of fruit development and leaves in Coffea Canephora (Robusta). Hybridizations for two independent experiments served to compare global gene expression profiles in three types of tissue matter (mature beans, leaves and flowers) in C. canephora as well as in the leaves of three different coffee species (C. canephora, C. eugenoides and C. arabica). Microarray construction, statistical analyses and validation by Q-PCR analysis are presented in this study. We have generated the first 15 K coffee array during this PUCE CAFE project, granted by Génoplante (the French consortium for plant genomics). This new tool will help study functional genomics in a wide range of experiments on various plant tissues, such as analyzing bean maturation or resistance to pathogens or drought. Furthermore, the use of this array has proven to be valid in different coffee species (diploid or tetraploid), drastically enlarging its impact for high-throughput gene expression in the community of coffee research.
Functional genetics for all: engineered nucleases, CRISPR and the gene editing revolution.
Gilles, Anna F; Averof, Michalis
2014-01-01
Developmental biology, as all experimental science, is empowered by technological advances. The availability of genetic tools in some species - designated as model organisms - has driven their use as major platforms for understanding development, physiology and behavior. Extending these tools to a wider range of species determines whether (and how) we can experimentally approach developmental diversity and evolution. During the last two decades, comparative developmental biology (evo-devo) was marked by the introduction of gene knockdown and deep sequencing technologies that are applicable to a wide range of species. These approaches allowed us to test the developmental role of specific genes in diverse species, to study biological processes that are not accessible in established models and, in some cases, to conduct genome-wide screens that overcome the limitations of the candidate gene approach. The recent discovery of CRISPR/Cas as a means of precise alterations into the genome promises to revolutionize developmental genetics. In this review we describe the development of gene editing tools, from zinc-finger nucleases to TALENs and CRISPR, and examine their application in gene targeting, their limitations and the opportunities they present for evo-devo. We outline their use in gene knock-out and knock-in approaches, and in manipulating gene functions by directing molecular effectors to specific sites in the genome. The ease-of-use and efficiency of CRISPR in diverse species provide an opportunity to close the technology gap that exists between established model organisms and emerging genetically-tractable species.
Bolbase: a comprehensive genomics database for Brassica oleracea
2013-01-01
Background Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives. Description We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting. Conclusions Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase. PMID:24079801
Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.
Li, Sanshu; Breaker, Ronald R
2017-10-13
With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.
Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.
Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin
2018-04-12
The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.
Oncogenic gene fusions drive many human cancers, but tools to more quickly unravel their functional contributions are needed. Here we describe methodology permitting fusion gene construction for functional evaluation. Using this strategy, we engineered the known fusion oncogenes, BCR-ABL1, EML4-ALK, and ETV6-NTRK3, as well as 20 previously uncharacterized fusion genes identified in TCGA datasets.
CoCoNUT: an efficient system for the comparison and analysis of genomes
2008-01-01
Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477
Tabor, Holly K.; Jamal, Seema M.; Yu, Joon-Ho; Crouch, Julia M.; Shankar, Aditi G.; Dent, Karin M.; Anderson, Nick; Miller, Damon A.; Futral, Brett T.; Bamshad, Michael J.
2016-01-01
A major challenge to implementing precision medicine is the need for an efficient and cost-effective strategy for returning individual genomic test results that is easily scalable and can be incorporated into multiple models of clinical practice. My46 is a web-based tool for managing the return of genetic results that was designed and developed to support a wide range of approaches to results disclosure, ranging from traditional face-to-face disclosure to self-guided models. My46 has five key functions: set and modify results return preferences, return results, educate, manage return of results, and assess return of results. These key functions are supported by six distinct modules and a suite of features that enhance the user experience, ease site navigation, facilitate knowledge sharing, and enable results return tracking. My46 is a potentially effective solution for returning results and supports current trends toward shared decision-making between patient and provider and patient-driven health management. PMID:27632689
RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.
Habegger, Lukas; Sboner, Andrea; Gianoulis, Tara A; Rozowsky, Joel; Agarwal, Ashish; Snyder, Michael; Gerstein, Mark
2011-01-15
The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.
Gene editing tools: state-of-the-art and the road ahead for the model and non-model fishes.
Barman, Hirak Kumar; Rasal, Kiran Dashrath; Chakrapani, Vemulawada; Ninawe, A S; Vengayil, Doyil T; Asrafuzzaman, Syed; Sundaray, Jitendra K; Jayasankar, Pallipuram
2017-10-01
Advancements in the DNA sequencing technologies and computational biology have revolutionized genome/transcriptome sequencing of non-model fishes at an affordable cost. This has led to a paradigm shift with regard to our heightened understandings of structure-functional relationships of genes at a global level, from model animals/fishes to non-model large animals/fishes. Whole genome/transcriptome sequencing technologies were supplemented with the series of discoveries in gene editing tools, which are being used to modify genes at pre-determined positions using programmable nucleases to explore their respective in vivo functions. For a long time, targeted gene disruption experiments were mostly restricted to embryonic stem cells, advances in gene editing technologies such as zinc finger nuclease, transcriptional activator-like effector nucleases and CRISPR (clustered regulatory interspaced short palindromic repeats)/CRISPR-associated nucleases have facilitated targeted genetic modifications beyond stem cells to a wide range of somatic cell lines across species from laboratory animals to farmed animals/fishes. In this review, we discuss use of different gene editing tools and the strategic implications in fish species for basic and applied biology research.
compendiumdb: an R package for retrieval and storage of functional genomics data.
Nandal, Umesh K; van Kampen, Antoine H C; Moerland, Perry D
2016-09-15
Currently, the Gene Expression Omnibus (GEO) contains public data of over 1 million samples from more than 40 000 microarray-based functional genomics experiments. This provides a rich source of information for novel biological discoveries. However, unlocking this potential often requires retrieving and storing a large number of expression profiles from a wide range of different studies and platforms. The compendiumdb R package provides an environment for downloading functional genomics data from GEO, parsing the information into a local or remote database and interacting with the database using dedicated R functions, thus enabling seamless integration with other tools available in R/Bioconductor. The compendiumdb package is written in R, MySQL and Perl. Source code and binaries are available from CRAN (http://cran.r-project.org/web/packages/compendiumdb/) for all major platforms (Linux, MS Windows and OS X) under the GPLv3 license. p.d.moerland@amc.uva.nl Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
RNA Interference for Functional Genomics and Improvement of Cotton (Gossypium sp.)
Abdurakhmonov, Ibrokhim Y.; Ayubov, Mirzakamol S.; Ubaydullaeva, Khurshida A.; Buriev, Zabardast T.; Shermatov, Shukhrat E.; Ruziboev, Haydarali S.; Shapulatov, Umid M.; Saha, Sukumar; Ulloa, Mauricio; Yu, John Z.; Percy, Richard G.; Devor, Eric J.; Sharma, Govind C.; Sripathi, Venkateswara R.; Kumpatla, Siva P.; van der Krol, Alexander; Kater, Hake D.; Khamidov, Khakimdjan; Salikhov, Shavkat I.; Jenkins, Johnie N.; Abdukarimov, Abdusattor; Pepper, Alan E.
2016-01-01
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium sp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function and biological roles of many key cotton genes involved in fiber development, fertility and somatic embryogenesis, resistance to important biotic and abiotic stresses, and oil and seed quality improvements as well as the key agronomic traits including yield and maturity. Here, we have comparatively reviewed seminal research efforts in previously used antisense approaches and currently applied breakthrough RNAi studies in cotton, analyzing developed RNAi methodologies, achievements, limitations, and future needs in functional characterizations of cotton genes. We also highlighted needed efforts in the development of RNAi-based cotton cultivars, and their safety and risk assessment, small and large-scale field trials, and commercialization. PMID:26941765
RNA Interference for Functional Genomics and Improvement of Cotton (Gossypium sp.).
Abdurakhmonov, Ibrokhim Y; Ayubov, Mirzakamol S; Ubaydullaeva, Khurshida A; Buriev, Zabardast T; Shermatov, Shukhrat E; Ruziboev, Haydarali S; Shapulatov, Umid M; Saha, Sukumar; Ulloa, Mauricio; Yu, John Z; Percy, Richard G; Devor, Eric J; Sharma, Govind C; Sripathi, Venkateswara R; Kumpatla, Siva P; van der Krol, Alexander; Kater, Hake D; Khamidov, Khakimdjan; Salikhov, Shavkat I; Jenkins, Johnie N; Abdukarimov, Abdusattor; Pepper, Alan E
2016-01-01
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium sp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function and biological roles of many key cotton genes involved in fiber development, fertility and somatic embryogenesis, resistance to important biotic and abiotic stresses, and oil and seed quality improvements as well as the key agronomic traits including yield and maturity. Here, we have comparatively reviewed seminal research efforts in previously used antisense approaches and currently applied breakthrough RNAi studies in cotton, analyzing developed RNAi methodologies, achievements, limitations, and future needs in functional characterizations of cotton genes. We also highlighted needed efforts in the development of RNAi-based cotton cultivars, and their safety and risk assessment, small and large-scale field trials, and commercialization.
Aubrey, Wayne; Riley, Michael C; Young, Michael; King, Ross D; Oliver, Stephen G; Clare, Amanda
2015-01-01
Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.
Aubrey, Wayne; Riley, Michael C.; Young, Michael; King, Ross D.; Oliver, Stephen G.; Clare, Amanda
2015-01-01
Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method’s primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome. PMID:26630677
E-TALEN: a web tool to design TALENs for genome engineering.
Heigwer, Florian; Kerr, Grainne; Walther, Nike; Glaeser, Kathrin; Pelz, Oliver; Breinig, Marco; Boutros, Michael
2013-11-01
Use of transcription activator-like effector nucleases (TALENs) is a promising new technique in the field of targeted genome engineering, editing and reverse genetics. Its applications span from introducing knockout mutations to endogenous tagging of proteins and targeted excision repair. Owing to this wide range of possible applications, there is a need for fast and user-friendly TALEN design tools. We developed E-TALEN (http://www.e-talen.org), a web-based tool to design TALENs for experiments of varying scale. E-TALEN enables the design of TALENs against a single target or a large number of target genes. We significantly extended previously published design concepts to consider genomic context and different applications. E-TALEN guides the user through an end-to-end design process of de novo TALEN pairs, which are specific to a certain sequence or genomic locus. Furthermore, E-TALEN offers a functionality to predict targeting and specificity for existing TALENs. Owing to the computational complexity of many of the steps in the design of TALENs, particular emphasis has been put on the implementation of fast yet accurate algorithms. We implemented a user-friendly interface, from the input parameters to the presentation of results. An additional feature of E-TALEN is the in-built sequence and annotation database available for many organisms, including human, mouse, zebrafish, Drosophila and Arabidopsis, which can be extended in the future.
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.
Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285
Regulatory variation: an emerging vantage point for cancer biology.
Li, Luolan; Lorzadeh, Alireza; Hirst, Martin
2014-01-01
Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
Annotation of gene function in citrus using gene expression information and co-expression networks
2014-01-01
Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870
Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues
2015-01-01
The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326
Identification of cis-suppression of human disease mutations by comparative genomics
Jordan, Daniel M.; Frangakis, Stephan G.; Golzio, Christelle; Cassa, Christopher A.; Kurtzberg, Joanne; Davis, Erica E.; Sunyaev, Shamil R.; Katsanis, Nicholas
2015-01-01
Patterns of amino acid conservation have served as a tool for understanding protein evolution1. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients2. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes3,4 revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity5,6. PMID:26123021
Rozman, Vita; Kunej, Tanja
2018-05-10
Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).
von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I
2006-02-06
The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
Amar, David; Frades, Itziar; Danek, Agnieszka; Goldberg, Tatyana; Sharma, Sanjeev K; Hedley, Pete E; Proux-Wera, Estelle; Andreasson, Erik; Shamir, Ron; Tzfadia, Oren; Alexandersson, Erik
2014-12-05
For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, 'omics', and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. We show that the automatic gene annotations of potato have low accuracy when compared to a "gold standard" based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of '-omics' datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper.
SeqWare Query Engine: storing and searching sequence data in the cloud.
O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F
2010-12-21
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud
2010-01-01
Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A
2015-01-01
A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
AutoFACT: An Automatic Functional Annotation and Classification Tool
Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud
2005-01-01
Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857
Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques
2008-01-01
This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.
The Gene Expression Omnibus Database.
Clough, Emily; Barrett, Tanya
2016-01-01
The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.
Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark
2012-09-01
The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
Modulating signaling networks by CRISPR/Cas9-mediated transposable element insertion.
Vaschetto, Luis María
2018-04-01
In a recent past, transposable elements (TEs) were referred to as selfish genetic components only capable of copying themselves with the aim of increasing the odds of being inherited. Nonetheless, TEs have been initially proposed as positive control elements acting in synergy with the host. Nowadays, it is well known that TE movement into host genome comprises an important evolutionary mechanism capable of increasing the adaptive fitness. As insights into TE functioning are increasing day to day, the manipulation of transposition has raised an interesting possibility of setting the host functions, although the lack of appropriate genome engineering tools has unpaved it. Fortunately, the emergence of genome editing technologies based on programmable nucleases, and especially the arrival of a multipurpose RNA-guided Cas9 endonuclease system, has made it possible to reconsider this challenge. For such purpose, a particular type of transposons referred to as miniature inverted-repeat transposable elements (MITEs) has shown a series of interesting characteristics for designing functional drivers. Here, recent insights into MITE elements and versatile RNA-guided CRISPR/Cas9 genome engineering system are given to understand how to deploy the potential of TEs for control of the host transcriptional activity.
Gene Function Analysis in the Ubiquitous Human Commensal and Pathogen Malassezia Genus.
Ianiri, Giuseppe; Averette, Anna F; Kingsbury, Joanne M; Heitman, Joseph; Idnurm, Alexander
2016-11-29
The genus Malassezia includes 14 species that are found on the skin of humans and animals and are associated with a number of diseases. Recent genome sequencing projects have defined the gene content of all 14 species; however, to date, genetic manipulation has not been possible for any species within this genus. Here, we develop and then optimize molecular tools for the transformation of Malassezia furfur and Malassezia sympodialis using Agrobacterium tumefaciens delivery of transfer DNA (T-DNA) molecules. These T-DNAs can insert randomly into the genome. In the case of M. furfur, targeted gene replacements were also achieved via homologous recombination, enabling deletion of the ADE2 gene for purine biosynthesis and of the LAC2 gene predicted to be involved in melanin biosynthesis. Hence, the introduction of exogenous DNA and direct gene manipulation are feasible in Malassezia species. Species in the genus Malassezia are a defining component of the microbiome of the surface of mammals. They are also associated with a wide range of skin disease symptoms. Many species are difficult to culture in vitro, and although genome sequences are available for the species in this genus, it has not been possible to assess gene function to date. In this study, we pursued a series of possible transformation methods and identified one that allows the introduction of DNA into two species of Malassezia, including the ability to make targeted integrations into the genome such that genes can be deleted. This research opens a new direction in terms of now being able to analyze gene functions in this little understood genus. These tools will contribute to define the mechanisms that lead to the commensalism and pathogenicity in this group of obligate fungi that are predominant on the skin of mammals. Copyright © 2016 Ianiri et al.
'PACLIMS': a component LIM system for high-throughput functional genomic analysis.
Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A
2005-04-12
Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the approximately 11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors.
'PACLIMS': A component LIM system for high-throughput functional genomic analysis
Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A
2005-01-01
Background Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the ~11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. Results The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Conclusion Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors. PMID:15826298
Castellana, Stefano; Fusilli, Caterina; Mazzoccoli, Gianluigi; Biagini, Tommaso; Capocefalo, Daniele; Carella, Massimo; Vescovi, Angelo Luigi; Mazza, Tommaso
2017-06-01
24,189 are all the possible non-synonymous amino acid changes potentially affecting the human mitochondrial DNA. Only a tiny subset was functionally evaluated with certainty so far, while the pathogenicity of the vast majority was only assessed in-silico by software predictors. Since these tools proved to be rather incongruent, we have designed and implemented APOGEE, a machine-learning algorithm that outperforms all existing prediction methods in estimating the harmfulness of mitochondrial non-synonymous genome variations. We provide a detailed description of the underlying algorithm, of the selected and manually curated training and test sets of variants, as well as of its classification ability.
NCBI GEO: archive for functional genomics data sets—update
Barrett, Tanya; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L.; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra
2013-01-01
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data. PMID:23193258
Mykles, Donald L.; Burnett, Karen G.; Durica, David S.; Joyce, Blake L.; McCarthy, Fiona M.; Schmidt, Carl J.; Stillman, Jonathon H.
2016-01-01
High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the “Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology” symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. PMID:27639274
Mazandu, Gaston K; Mulder, Nicola J
2012-07-01
Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action. Copyright © 2011 Elsevier B.V. All rights reserved.
Development of constraint-based system-level models of microbial metabolism.
Navid, Ali
2012-01-01
Genome-scale models of metabolism are valuable tools for using genomic information to predict microbial phenotypes. System-level mathematical models of metabolic networks have been developed for a number of microbes and have been used to gain new insights into the biochemical conversions that occur within organisms and permit their survival and proliferation. Utilizing these models, computational biologists can (1) examine network structures, (2) predict metabolic capabilities and resolve unexplained experimental observations, (3) generate and test new hypotheses, (4) assess the nutritional requirements of the organism and approximate its environmental niche, (5) identify missing enzymatic functions in the annotated genome, and (6) engineer desired metabolic capabilities in model organisms. This chapter details the protocol for developing genome-scale models of metabolism in microbes as well as tips for accelerating the model building process.
Combining genomic and proteomic approaches for epigenetics research
Han, Yumiao; Garcia, Benjamin A
2014-01-01
Epigenetics is the study of changes in gene expression or cellular phenotype that do not change the DNA sequence. In this review, current methods, both genomic and proteomic, associated with epigenetics research are discussed. Among them, chromatin immunoprecipitation (ChIP) followed by sequencing and other ChIP-based techniques are powerful techniques for genome-wide profiling of DNA-binding proteins, histone post-translational modifications or nucleosome positions. However, mass spectrometry-based proteomics is increasingly being used in functional biological studies and has proved to be an indispensable tool to characterize histone modifications, as well as DNA–protein and protein–protein interactions. With the development of genomic and proteomic approaches, combination of ChIP and mass spectrometry has the potential to expand our knowledge of epigenetics research to a higher level. PMID:23895656
CnidBase: The Cnidarian Evolutionary Genomics Database
Ryan, Joseph F.; Finnerty, John R.
2003-01-01
CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians. In turn, CnidBase will help to illuminate the role of specific genes in shaping cnidarian biodiversity in the present day and in the distant past. CnidBase highlights evolutionary changes between species within the phylum Cnidaria and structures genomic and expression data to facilitate comparisons to non-cnidarian metazoans. CnidBase aims to further the progress that has already been made in the realm of cnidarian evolutionary genomics by creating a central community resource which will help drive future research and facilitate more accurate classification and comparison of new experimental data with existing data. CnidBase is available at http://cnidbase.bu.edu/. PMID:12519972
Segmenting the human genome based on states of neutral genetic divergence.
Kuruppumullage Don, Prabhani; Ananda, Guruprasad; Chiaromonte, Francesca; Makova, Kateryna D
2013-09-03
Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.
htsint: a Python library for sequencing pipelines that combines data through gene set generation.
Richards, Adam J; Herrel, Anthony; Bonneaud, Camille
2015-09-24
Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
Transposons As Tools for Functional Genomics in Vertebrate Models.
Kawakami, Koichi; Largaespada, David A; Ivics, Zoltán
2017-11-01
Genetic tools and mutagenesis strategies based on transposable elements are currently under development with a vision to link primary DNA sequence information to gene functions in vertebrate models. By virtue of their inherent capacity to insert into DNA, transposons can be developed into powerful tools for chromosomal manipulations. Transposon-based forward mutagenesis screens have numerous advantages including high throughput, easy identification of mutated alleles, and providing insight into genetic networks and pathways based on phenotypes. For example, the Sleeping Beauty transposon has become highly instrumental to induce tumors in experimental animals in a tissue-specific manner with the aim of uncovering the genetic basis of diverse cancers. Here, we describe a battery of mutagenic cassettes that can be applied in conjunction with transposon vectors to mutagenize genes, and highlight versatile experimental strategies for the generation of engineered chromosomes for loss-of-function as well as gain-of-function mutagenesis for functional gene annotation in vertebrate models, including zebrafish, mice, and rats. Copyright © 2017 Elsevier Ltd. All rights reserved.
PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens.
Spahn, Philipp N; Bath, Tyler; Weiss, Ryan J; Kim, Jihoon; Esko, Jeffrey D; Lewis, Nathan E; Harismendy, Olivier
2017-11-20
Large-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise. To make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots. PinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions and test datasets.
2011-01-01
Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of the zebrafish genome. BES of common carp are tremendous tools for comparative mapping between the two closely related species, zebrafish and common carp, which should facilitate both structural and functional genome analysis in common carp. PMID:21492448
Transcriptome analysis and related databases of Lactococcus lactis.
Kuipers, Oscar P; de Jong, Anne; Baerends, Richard J S; van Hijum, Sacha A F T; Zomer, Aldert L; Karsens, Harma A; den Hengst, Chris D; Kramer, Naomi E; Buist, Girbe; Kok, Jan
2002-08-01
Several complete genome sequences of Lactococcus lactis and their annotations will become available in the near future, next to the already published genome sequence of L. lactis ssp. lactis IL 1403. This will allow intraspecies comparative genomics studies as well as functional genomics studies aimed at a better understanding of physiological processes and regulatory networks operating in lactococci. This paper describes the initial set-up of a DNA-microarray facility in our group, to enable transcriptome analysis of various Gram-positive bacteria, including a ssp. lactis and a ssp. cremoris strain of Lactococcus lactis. Moreover a global description will be given of the hardware and software requirements for such a set-up, highlighting the crucial integration of relevant bioinformatics tools and methods. This includes the development of MolGenIS, an information system for transcriptome data storage and retrieval, and LactococCye, a metabolic pathway/genome database of Lactococcus lactis.
The Genome Portal of the Department of Energy Joint Genome Institute
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nordberg, Henrik; Cantor, Michael; Dushekyo, Serge
2014-03-14
The JGI Genome Portal (http://genome.jgi.doe.gov) provides unified access to all JGI genomic databases and analytical tools. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes. Genome Portal in the past 2 years was significantly updated, with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI. A critical aspect of handling big data in genomics is the development of visualization and analysis tools that allow scientists to derive meaning from what are otherwise terrabases ofmore » inert sequence. An interactive visualization tool developed in the group allows us to explore contigs resulting from a single metagenome assembly. Implemented with modern web technologies that take advantage of the power of the computer's graphical processing unit (gpu), the tool allows the user to easily navigate over a 100,000 data points in multiple dimensions, among many biologically meaningful parameters of a dataset such as relative abundance, contig length, and G+C content.« less
Harb, Omar S; Roos, David S
2015-01-01
Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.
Developing molecular tools for Chlamydomonas reinhardtii
NASA Astrophysics Data System (ADS)
Noor-Mohammadi, Samaneh
Microalgae have garnered increasing interest over the years for their ability to produce compounds ranging from biofuels to neutraceuticals. A main focus of researchers has been to use microalgae as a natural bioreactor for the production of valuable and complex compounds. Recombinant protein expression in the chloroplasts of green algae has recently become more routine; however, the heterologous expression of multiple proteins or complete biosynthetic pathways remains a significant challenge. To take full advantage of these organisms' natural abilities, sophisticated molecular tools are needed to be able to introduce and functionally express multiple gene biosynthetic pathways in its genome. To achieve the above objective, we have sought to establish a method to construct, integrate and express multigene operons in the chloroplast and nuclear genome of the model microalgae Chlamydomonas reinhardtii. Here we show that a modified DNA Assembler approach can be used to rapidly assemble multiple-gene biosynthetic pathways in yeast and then integrate these assembled pathways at a site-specific location in the chloroplast, or by random integration in the nuclear genome of C. reinhardtii. As a proof of concept, this method was used to successfully integrate and functionally express up to three reporter proteins (AphA6, AadA, and GFP) in the chloroplast of C. reinhardtii and up to three reporter proteins (Ble, AphVIII, and GFP) in its nuclear genome. An analysis of the relative gene expression of the engineered strains showed significant differences in the mRNA expression levels of the reporter genes and thus highlights the importance of proper promoter/untranslated-region selection when constructing a target pathway. In addition, this work focuses on expressing the cofactor regeneration enzyme phosphite dehydrogenase (PTDH) in the chloroplast and nuclear genomes of C. reinhardtii. The PTDH enzyme converts phosphite into phosphate and NAD(P)+ into NAD(P)H. The reduced nicotinamide cofactor NAD(P)H plays a pivotal role in many biochemical oxidation and reduction reactions, thus this enzyme would allow regeneration of NAD(P)H in a microalgae strain over-expressing a NAD(P)H-dependent oxidoreductase. A phosphite dehydrogenase gene was introduced into the chloroplast genome (codon optimized) and nuclear genome of C. reinhardtii by biolistic transformation and electroporation in separate events, respectively. Successful expression of the heterologous protein was confirmed by transcript analysis and protein analysis. In conclusion, this new method represents a useful genetic tool in the construction and integration of complex biochemical pathways into the chloroplast or nuclear genome of microalgae, and this should aid current efforts to engineer algae for recombinant protein expression, biofuels production and production of other desirable natural products.
Biological interpretation of genome-wide association studies using predicted gene functions.
Pers, Tune H; Karjalainen, Juha M; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R; Yang, Jian; Lui, Julian C; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S N; Hirschhorn, Joel N; Franke, Lude
2015-01-19
The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Manning, Ruth Ann
Recent advances in DNA sequencing and genome mapping technologies are making it possible, for the first time in history, to find genes in plants and animals and to elucidate their function. This means that diagnostics and therapeutics can be developed for human diseases such as cancer, obesity, hypertension, and cardiovascular problems. Crop and animal strains can be developed that are hardier, resistant to diseases, and produce higher yields. The challenge is to develop tools that will find the nucleotides in the DNA of a living organism that comprise a particular gene. In the human genome alone it is estimated thatmore » only about 51% of the approximately 3 billion pairs of nucleotides code for some 100,000 human genes. In this search for nucleotides within a genome which are active in the actual coding of proteins, efficient tools to locate and identify their function can be of significant value to mankind. Software tools such as ApoCom GRAIL{trademark} have assisted in this search. It can be used to analyze genome information, to identify exons (coding regions) and to construct gene models. Using a neural network approach, this software can ''learn'' sequence patterns and refine its ability to recognize a pattern as it is exposed to more and more examples of it. Since 1992 versions of GRAIL{trademark} have been publicly available over the Internet from Oak Ridge National Laboratory. Because of the potential for security and patent compromise, these Internet versions are not available to many researchers in pharmaceutical and biotechnology companies who cannot send proprietary sequences past their data-secure firewalls. ApoCom is making available commercial versions of the GRAIL{trademark} software to run self-contained over local area networks. As part of the commercialization effort, ApoCom has developed a new Java{trademark}-based graphical user interface, the ApoCom Client Tool for Genomics (ACTG){trademark}. Two products, ApoCom GRAIL{trademark} Network Edition and ApoCom GRAIL{trademark} Personal Edition, have been developed to reach two diverse niche markets in the Phase III commercialization of this software. As a result of this project ApoCom GRAIL{trademark} can now be made available to the desktop (UNIX{reg_sign}, Windows{reg_sign} 95 and Windows NT{reg_sign}, or Mac{trademark} 0S) of any researcher who needs it.« less
Functional genomics of bio-energy plants and related patent activities.
Jiang, Shu-Ye; Ramachandran, Srinivasan
2013-04-01
With dwindling fossil oil resources and increased economic growth of many developing countries due to globalization, energy driven from an alternative source such as bio-energy in a sustainable fashion is the need of the hour. However, production of energy from biological source is relatively expensive due to low starch and sugar contents of bioenergy plants leading to lower oil yield and reduced quality along with lower conversion efficiency of feedstock. In this context genetic improvement of bio-energy plants offers a viable solution. In this manuscript, we reviewed the current status of functional genomics studies and related patent activities in bio-energy plants. Currently, genomes of considerable bio-energy plants have been sequenced or are in progress and also large amount of expression sequence tags (EST) or cDNA sequences are available from them. These studies provide fundamental data for more reliable genome annotation and as a result, several genomes have been annotated in a genome-wide level. In addition to this effort, various mutagenesis tools have also been employed to develop mutant populations for characterization of genes that are involved in bioenergy quantitative traits. With the progress made on functional genomics of important bio-energy plants, more patents were filed with a significant number of them focusing on genes and DNA sequences which may involve in improvement of bio-energy traits including higher yield and quality of starch, sugar and oil. We also believe that these studies will lead to the generation of genetically altered plants with improved tolerance to various abiotic and biotic stresses.
A Genomic Resource for the Development, Improvement, and Exploitation of Sorghum for Bioenergy
Brenton, Zachary W.; Cooper, Elizabeth A.; Myers, Mathew T.; Boyles, Richard E.; Shakoor, Nadia; Zielinski, Kelsey J.; Rauh, Bradley L.; Bridges, William C.; Morris, Geoffrey P.; Kresovich, Stephen
2016-01-01
With high productivity and stress tolerance, numerous grass genera of the Andropogoneae have emerged as candidates for bioenergy production. To optimize these candidates, research examining the genetic architecture of yield, carbon partitioning, and composition is required to advance breeding objectives. Significant progress has been made developing genetic and genomic resources for Andropogoneae, and advances in comparative and computational genomics have enabled research examining the genetic basis of photosynthesis, carbon partitioning, composition, and sink strength. To provide a pivotal resource aimed at developing a comparative understanding of key bioenergy traits in the Andropogoneae, we have established and characterized an association panel of 390 racially, geographically, and phenotypically diverse Sorghum bicolor accessions with 232,303 genetic markers. Sorghum bicolor was selected because of its genomic simplicity, phenotypic diversity, significant genomic tools, and its agricultural productivity and resilience. We have demonstrated the value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates. We identified potential genes, including a cellulase enzyme and a vacuolar transporter, associated with increased non-structural carbohydrates that could lead to bioenergy sorghum improvement. Although our analysis identified genes with potentially clear functions, other candidates did not have assigned functions, suggesting novel molecular mechanisms for carbon partitioning traits. These results, combined with our characterization of phenotypic and genetic diversity and the public accessibility of each accession and genomic data, demonstrate the value of this resource and provide a foundation for future improvement of sorghum and related grasses for bioenergy production. PMID:27356613
Improved Genome Assembly and Annotation for the Rock Pigeon (Columba livia).
Holt, Carson; Campbell, Michael; Keays, David A; Edelman, Nathaniel; Kapusta, Aurélie; Maclary, Emily; T Domyan, Eric; Suh, Alexander; Warren, Wesley C; Yandell, Mark; Gilbert, M Thomas P; Shapiro, Michael D
2018-05-04
The domestic rock pigeon ( Columba livia ) is among the most widely distributed and phenotypically diverse avian species. C. livia is broadly studied in ecology, genetics, physiology, behavior, and evolutionary biology, and has recently emerged as a model for understanding the molecular basis of anatomical diversity, the magnetic sense, and other key aspects of avian biology. Here we report an update to the C. livia genome reference assembly and gene annotation dataset. Greatly increased scaffold lengths in the updated reference assembly, along with an updated annotation set, provide improved tools for evolutionary and functional genetic studies of the pigeon, and for comparative avian genomics in general. Copyright © 2018 Holt et al.
Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function.
Mehrotra, Shweta; Goyal, Vinod
2014-08-01
Repetitive DNA sequences are a major component of eukaryotic genomes and may account for up to 90% of the genome size. They can be divided into minisatellite, microsatellite and satellite sequences. Satellite DNA sequences are considered to be a fast-evolving component of eukaryotic genomes, comprising tandemly-arrayed, highly-repetitive and highly-conserved monomer sequences. The monomer unit of satellite DNA is 150-400 base pairs (bp) in length. Repetitive sequences may be species- or genus-specific, and may be centromeric or subtelomeric in nature. They exhibit cohesive and concerted evolution caused by molecular drive, leading to high sequence homogeneity. Repetitive sequences accumulate variations in sequence and copy number during evolution, hence they are important tools for taxonomic and phylogenetic studies, and are known as "tuning knobs" in the evolution. Therefore, knowledge of repetitive sequences assists our understanding of the organization, evolution and behavior of eukaryotic genomes. Repetitive sequences have cytoplasmic, cellular and developmental effects and play a role in chromosomal recombination. In the post-genomics era, with the introduction of next-generation sequencing technology, it is possible to evaluate complex genomes for analyzing repetitive sequences and deciphering the yet unknown functional potential of repetitive sequences. Copyright © 2014 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Geib, Scott M; Hall, Brian; Derego, Theodore; Bremer, Forest T; Cannoles, Kyle; Sim, Sheina B
2018-04-01
One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI's annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline. The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to the NCBI. It is useful for any individual researcher or research group that wishes to submit a genome assembly of their study system to the NCBI.
Hall, Brian; Derego, Theodore; Bremer, Forest T; Cannoles, Kyle
2018-01-01
Abstract Background One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI’s annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. Findings The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline Conclusions The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to the NCBI. It is useful for any individual researcher or research group that wishes to submit a genome assembly of their study system to the NCBI. PMID:29635297
Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R
2017-01-01
Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. DOI: http://dx.doi.org/10.7554/eLife.26580.001 PMID:28678007
Solving the Problem: Genome Annotation Standards before the Data Deluge.
Klimke, William; O'Donovan, Claire; White, Owen; Brister, J Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D; Tatusova, Tatiana
2011-10-15
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
Solving the Problem: Genome Annotation Standards before the Data Deluge
Klimke, William; O'Donovan, Claire; White, Owen; Brister, J. Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D.; Tatusova, Tatiana
2011-01-01
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries. PMID:22180819
GenomePeek—an online tool for prokaryotic genome and metagenome analysis
McNair, Katelyn; Edwards, Robert A.
2015-06-16
As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less
Kim, Mara; Cooper, Brian A.; Venkat, Rohit; Phillips, Julie B.; Eidem, Haley R.; Hirbo, Jibril; Nutakki, Sashank; Williams, Scott M.; Muglia, Louis J.; Capra, J. Anthony; Petren, Kenneth; Abbot, Patrick; Rokas, Antonis; McGary, Kriston L.
2016-01-01
Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy. PMID:26567549
Silencing GhNDR1 and GhMKK2 compromised cotton resistance to Verticillium wilt
Gao, Xiquan; Wheeler, Terry; Li, Zhaohu; Kenerley, Charles M.; He, Ping; Shan, Libo
2011-01-01
SUMMARY Cotton is an important cash crop worldwide and serves as a significant source of fiber, feed, foodstuff, oil and biofuel products. Considerable effort in genetics and genomics has been expended to increase sustainable yield and quality through molecular breeding and genetic engineering of new cotton cultivars. With the effort of whole genome sequencing of cotton, it is essential to develop molecular tools and resources for large-scale analysis of gene functions at the genome-wide level. We have successfully established an Agrobacterium-mediated virus-induced gene silencing (VIGS) assay in several cotton cultivars with different genetic backgrounds. The genes of interest were potently and readily silenced within 2 weeks after inoculation at the seedling stage. Importantly, we showed that silencing GhNDR1 and GhMKK2 compromised cotton resistance to the infection by Verticillium dahliae, a fungal pathogen causing Verticillium wilt. Furthermore, we established a cotton protoplast system for transient gene expression to study gene functions by a gain-of-function approach. The viable protoplasts were isolated from green cotyledons, etiolated cotyledons, and true leaves, and responded to a wide range of pathogen elicitors and phytohormones. Remarkably, cotton plants possess conserved, but also distinct MAP kinase activation with Arabidopsis upon bacterial elicitor flagellin perception. Thus, we demonstrated that GhNDR1 and GhMKK2 are required for Verticillium resistance in cotton using gene silencing assays, and established the high throughput loss-of-function and gain-of-function assays for functional genomic studies in cotton. PMID:21219508
A knowledge base for Vitis vinifera functional analysis.
Pulvirenti, Alfredo; Giugno, Rosalba; Distefano, Rosario; Pigola, Giuseppe; Mongiovi, Misael; Giudice, Girolamo; Vendramin, Vera; Lombardo, Alessandro; Cattonaro, Federica; Ferro, Alfredo
2015-01-01
Vitis vinifera (Grapevine) is the most important fruit species in the modern world. Wine and table grapes sales contribute significantly to the economy of major wine producing countries. The most relevant goals in wine production concern quality and safety. In order to significantly improve the achievement of these objectives and to gain biological knowledge about cultivars, a genomic approach is the most reliable strategy. The recent grapevine genome sequencing offers the opportunity to study the potential roles of genes and microRNAs in fruit maturation and other physiological and pathological processes. Although several systems allowing the analysis of plant genomes have been reported, none of them has been designed specifically for the functional analysis of grapevine genomes of cultivars under environmental stress in connection with microRNA data. Here we introduce a novel knowledge base, called BIOWINE, designed for the functional analysis of Vitis vinifera genomes of cultivars present in Sicily. The system allows the analysis of RNA-seq experiments of two different cultivars, namely Nero d'Avola and Nerello Mascalese. Samples were taken under different climatic conditions of phenological phases, diseases, and geographic locations. The BIOWINE web interface is equipped with data analysis modules for grapevine genomes. In particular users may analyze the current genome assembly together with the RNA-seq data through a customized version of GBrowse. The web interface allows users to perform gene set enrichment by exploiting third-party databases. BIOWINE is a knowledge base implementing a set of bioinformatics tools for the analysis of grapevine genomes. The system aims to increase our understanding of the grapevine varieties and species of Sicilian products focusing on adaptability to different climatic conditions, phenological phases, diseases, and geographic locations.
Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues
2015-01-01
The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Li, Edward B; Truong, Dawn; Hallett, Shawn A; Mukherjee, Kusumika; Schutte, Brian C; Liao, Eric C
2017-09-01
Large-scale sequencing efforts have captured a rapidly growing catalogue of genetic variations. However, the accurate establishment of gene variant pathogenicity remains a central challenge in translating personal genomics information to clinical decisions. Interferon Regulatory Factor 6 (IRF6) gene variants are significant genetic contributors to orofacial clefts. Although approximately three hundred IRF6 gene variants have been documented, their effects on protein functions remain difficult to interpret. Here, we demonstrate the protein functions of human IRF6 missense gene variants could be rapidly assessed in detail by their abilities to rescue the irf6 -/- phenotype in zebrafish through variant mRNA microinjections at the one-cell stage. The results revealed many missense variants previously predicted by traditional statistical and computational tools to be loss-of-function and pathogenic retained partial or full protein function and rescued the zebrafish irf6 -/- periderm rupture phenotype. Through mRNA dosage titration and analysis of the Exome Aggregation Consortium (ExAC) database, IRF6 missense variants were grouped by their abilities to rescue at various dosages into three functional categories: wild type function, reduced function, and complete loss-of-function. This sensitive and specific biological assay was able to address the nuanced functional significances of IRF6 missense gene variants and overcome many limitations faced by current statistical and computational tools in assigning variant protein function and pathogenicity. Furthermore, it unlocked the possibility for characterizing yet undiscovered human IRF6 missense gene variants from orofacial cleft patients, and illustrated a generalizable functional genomics paradigm in personalized medicine.
USDA-ARS?s Scientific Manuscript database
Dual luciferase reporter systems are valuable tools for functional genomic studies, but have not previously been developed for use in tick cell culture. We evaluated expression of available luciferase constructs in tick cell cultures derived from Rhipicephalus (Boophilus) microplus, an important vec...
USDA-ARS?s Scientific Manuscript database
The western corn rootworm (WCR), Diabrotica virgifera virgifera, is an insect pest of corn, and population suppression with chemical insecticides is an important management tool. Traits conferring organophosphate insecticide resistance have increased in frequency among WCR populations, resulting in...
Dhanyalakshmi, K H; Naika, Mahantesha B N; Sajeevan, R S; Mathew, Oommen K; Shafi, K Mohamed; Sowdhamini, Ramanathan; N Nataraja, Karaba
2016-01-01
The modern sequencing technologies are generating large volumes of information at the transcriptome and genome level. Translation of this information into a biological meaning is far behind the race due to which a significant portion of proteins discovered remain as proteins of unknown function (PUFs). Attempts to uncover the functional significance of PUFs are limited due to lack of easy and high throughput functional annotation tools. Here, we report an approach to assign putative functions to PUFs, identified in the transcriptome of mulberry, a perennial tree commonly cultivated as host of silkworm. We utilized the mulberry PUFs generated from leaf tissues exposed to drought stress at whole plant level. A sequence and structure based computational analysis predicted the probable function of the PUFs. For rapid and easy annotation of PUFs, we developed an automated pipeline by integrating diverse bioinformatics tools, designated as PUFs Annotation Server (PUFAS), which also provides a web service API (Application Programming Interface) for a large-scale analysis up to a genome. The expression analysis of three selected PUFs annotated by the pipeline revealed abiotic stress responsiveness of the genes, and hence their potential role in stress acclimation pathways. The automated pipeline developed here could be extended to assign functions to PUFs from any organism in general. PUFAS web server is available at http://caps.ncbs.res.in/pufas/ and the web service is accessible at http://capservices.ncbs.res.in/help/pufas.
DNA assembler, an in vivo genetic method for rapid construction of biochemical pathways
Shao, Zengyi; Zhao, Hua; Zhao, Huimin
2009-01-01
The assembly of large recombinant DNA encoding a whole biochemical pathway or genome represents a significant challenge. Here, we report a new method, DNA assembler, which allows the assembly of an entire biochemical pathway in a single step via in vivo homologous recombination in Saccharomyces cerevisiae. We show that DNA assembler can rapidly assemble a functional d-xylose utilization pathway (∼9 kb DNA consisting of three genes), a functional zeaxanthin biosynthesis pathway (∼11 kb DNA consisting of five genes) and a functional combined d-xylose utilization and zeaxanthin biosynthesis pathway (∼19 kb consisting of eight genes) with high efficiencies (70–100%) either on a plasmid or on a yeast chromosome. As this new method only requires simple DNA preparation and one-step yeast transformation, it represents a powerful tool in the construction of biochemical pathways for synthetic biology, metabolic engineering and functional genomics studies. PMID:19074487
An integrative model for in-silico clinical-genomics discovery science.
Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael
2002-01-01
Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world
Koonin, Eugene V.; Wolf, Yuri I.
2008-01-01
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution. PMID:18948295
Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh
2015-01-01
Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism. PMID:26196387
Chowdhary, Nupoor; Selvaraj, Ashok; KrishnaKumaar, Lakshmi; Kumar, Gopal Ramesh
2015-01-01
Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2) production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs). Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs), 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach), we strongly suggest that Csac_0437 and Csac_0424 encode for glycoside hydrolases (GH) and are proposed to be involved in the decomposition of recalcitrant plant polysaccharides. Similarly, HPs: Csac_0732, Csac_1862, Csac_1294 and Csac_0668 are suggested to play a significant role in biohydrogen production. Function prediction of these HPs by using our integrated approach will considerably enhance the interpretation of large-scale experiments targeting this industrially important organism.
The coffee genome hub: a resource for coffee genomes
Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan
2015-01-01
The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413
PanWeb: A web interface for pan-genomic analysis.
Pantoja, Yan; Pinheiro, Kenny; Veras, Allan; Araújo, Fabrício; Lopes de Sousa, Ailton; Guimarães, Luis Carlos; Silva, Artur; Ramos, Rommel T J
2017-01-01
With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of deposited prokaryote genomes has enabled the investigation of several genomic characteristics that are intrinsic to certain species. Thus, a new approach to comparative genomics, termed pan-genomics, was developed. In pan-genomics, various organisms of the same species or genus are compared. Currently, there are many tools that can perform pan-genomic analyses, such as PGAP (Pan-Genome Analysis Pipeline), Panseq (Pan-Genome Sequence Analysis Program) and PGAT (Prokaryotic Genome Analysis Tool). Among these software tools, PGAP was developed in the Perl scripting language and its reliance on UNIX platform terminals and its requirement for an extensive parameterized command line can become a problem for users without previous computational knowledge. Thus, the aim of this study was to develop a web application, known as PanWeb, that serves as a graphical interface for PGAP. In addition, using the output files of the PGAP pipeline, the application generates graphics using custom-developed scripts in the R programming language. PanWeb is freely available at http://www.computationalbiology.ufpa.br/panweb.
Strategies and tools for whole genome alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas
2002-11-25
The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With amore » view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.« less
Human Mitochondrial Protein Database
National Institute of Standards and Technology Data Gateway
SRD 131 Human Mitochondrial Protein Database (Web, free access) The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.
Better Smelling Through Genetics: Mammalian Odor Perception
Keller, Andreas; Vosshall, Leslie B.
2008-01-01
SUMMARY The increasing availability of genomic and genetic tools to study olfaction—the sense of smell—has brought important new insights into how this chemosensory modality functions in different species. Newly sequenced mammalian genomes—from platypus to dog—have made it possible to infer how smell has evolved to suit the needs of a given species and how variation within a species may affect individual olfactory perception. This review will focus on recent advances in the genetics and genomics of mammalian smell, with a primary focus on rodents and humans. PMID:18938244
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species
Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha
2011-01-01
Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
What the Aspergillus genomes have told us.
Nierman, W C; May, G; Kim, H S; Anderson, M J; Chen, D; Denning, D W
2005-05-01
The sequencing and annotation of the genomes of the first strains of Aspergillus nidulans, Aspergillus oryzae, and Aspergillus fumigatus will be seen in retrospect as a transformational event in Aspergillus biology. With this event the entire genetic composition of A. nidulans, the sexual experimental model organism of the genus Aspergillus, A. oryzae, the food biotechnology organism which is the product of centuries of cultivation, and A. fumigatus, the most common causative agent of invasive aspergillosis is now revealed to the extent that we are at present able to understand. Each genome exhibits a large set of genes common to the three as well as a much smaller set of genes unique to each. Moreover, these sequences serve as resources providing the major tool to expanding our understanding of the biology of each. Transcription profiling of A. fumigatus at high temperatures and comparative genomic hybridization between A. fumigatus and a closely related Aspergillus species provides microarray based examples of the beginning of functional analysis of the genomes of these organisms going forward from the genome sequence.
Khan, Muhammad Hafeez Ullah; Khan, Shahid U; Muhammad, Ali; Hu, Limin; Yang, Yang; Fan, Chuchuan
2018-06-01
Clustered regularly interspaced palindromic repeats associated protein Cas9 (CRISPR-Cas9), originally an adaptive immunity system of prokaryotes, is revolutionizing genome editing technologies with minimal off-targets in the present era. The CRISPR/Cas9 is now highly emergent, advanced, and highly specific tool for genome engineering. The technology is widely used to animal and plant genomes to achieve desirable results. The present review will encompass how CRISPR-Cas9 is revealing its beneficial role in characterizing plant genetic functions, genomic rearrangement, how it advances the site-specific mutagenesis, and epigenetics modification in plants to improve the yield of field crops with minimal side-effects. The possible pitfalls of using and designing CRISPR-Cas9 for plant genome editing are also discussed for its more appropriate applications in plant biology. Therefore, CRISPR/Cas9 system has multiple benefits that mostly scientists select for genome editing in several biological systems. © 2017 Wiley Periodicals, Inc.
Genome Editing for the Study of Cardiovascular Diseases
Chadwick, Alexandra C.
2018-01-01
Purpose of Review The opportunities afforded through the recent advent of genome-editing technologies have allowed investigators to more easily study a number of diseases. The advantages and limitations of the most prominent genome-editing technologies are described in this review, along with potential applications specifically focused on cardiovascular diseases. Recent Findings The recent genome-editing tools using programmable nucleases, such as zinc-finger nucleases, transcription activator-like effector nucleases, and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9), have rapidly been adapted to manipulate genes in a variety of cellular and animal models. A number of recent cardiovascular disease-related publications report cases in which specific mutations are introduced into disease models for functional characterization and for testing of therapeutic strategies. Summary Recent advances in genome-editing technologies offer new approaches to understand and treat diseases. Here, we discuss genome editing strategies to easily characterize naturally occurring mutations and offer strategies with potential clinical relevance. PMID:28220462
A genome-wide 3C-method for characterizing the three-dimensional architectures of genomes.
Duan, Zhijun; Andronescu, Mirela; Schutz, Kevin; Lee, Choli; Shendure, Jay; Fields, Stanley; Noble, William S; Anthony Blau, C
2012-11-01
Accumulating evidence demonstrates that the three-dimensional (3D) organization of chromosomes within the eukaryotic nucleus reflects and influences genomic activities, including transcription, DNA replication, recombination and DNA repair. In order to uncover structure-function relationships, it is necessary first to understand the principles underlying the folding and the 3D arrangement of chromosomes. Chromosome conformation capture (3C) provides a powerful tool for detecting interactions within and between chromosomes. A high throughput derivative of 3C, chromosome conformation capture on chip (4C), executes a genome-wide interrogation of interaction partners for a given locus. We recently developed a new method, a derivative of 3C and 4C, which, similar to Hi-C, is capable of comprehensively identifying long-range chromosome interactions throughout a genome in an unbiased fashion. Hence, our method can be applied to decipher the 3D architectures of genomes. Here, we provide a detailed protocol for this method. Published by Elsevier Inc.
Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.
2007-01-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337
Chen, Jingjing; Lai, Yiling; Wang, Lili; Zhai, Suzhen; Zou, Gen; Zhou, Zhihua; Cui, Chunlai; Wang, Sibao
2017-04-03
Beauveria bassiana is an environmentally friendly alternative to chemical insecticides against various agricultural insect pests and vectors of human diseases. However, its application has been limited due to slow kill and sensitivity to abiotic stresses. Understanding of the molecular pathogenesis and physiological characteristics would facilitate improvement of the fungal performance. Loss-of-function mutagenesis is the most powerful tool to characterize gene functions, but it is hampered by the low rate of homologous recombination and the limited availability of selectable markers. Here, by combining the use of uridine auxotrophy as recipient and donor DNAs harboring auxotrophic complementation gene ura5 as a selectable marker with the blastospore-based transformation system, we established a highly efficient, low false-positive background and cost-effective CRISPR/Cas9-mediated gene editing system in B. bassiana. This system has been demonstrated as a simple and powerful tool for targeted gene knock-out and/or knock-in in B. bassiana in a single gene disruption. We further demonstrated that our system allows simultaneous disruption of multiple genes via homology-directed repair in a single transformation. This technology will allow us to study functionally redundant genes and holds significant potential to greatly accelerate functional genomics studies of B. bassiana.
Chen, Jingjing; Lai, Yiling; Wang, Lili; Zhai, Suzhen; Zou, Gen; Zhou, Zhihua; Cui, Chunlai; Wang, Sibao
2017-01-01
Beauveria bassiana is an environmentally friendly alternative to chemical insecticides against various agricultural insect pests and vectors of human diseases. However, its application has been limited due to slow kill and sensitivity to abiotic stresses. Understanding of the molecular pathogenesis and physiological characteristics would facilitate improvement of the fungal performance. Loss-of-function mutagenesis is the most powerful tool to characterize gene functions, but it is hampered by the low rate of homologous recombination and the limited availability of selectable markers. Here, by combining the use of uridine auxotrophy as recipient and donor DNAs harboring auxotrophic complementation gene ura5 as a selectable marker with the blastospore-based transformation system, we established a highly efficient, low false-positive background and cost-effective CRISPR/Cas9-mediated gene editing system in B. bassiana. This system has been demonstrated as a simple and powerful tool for targeted gene knock-out and/or knock-in in B. bassiana in a single gene disruption. We further demonstrated that our system allows simultaneous disruption of multiple genes via homology-directed repair in a single transformation. This technology will allow us to study functionally redundant genes and holds significant potential to greatly accelerate functional genomics studies of B. bassiana. PMID:28368054
A computational platform to maintain and migrate manual functional annotations for BioCyc databases.
Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A
2014-10-12
BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
2009-01-01
Background The diploid woodland strawberry (Fragaria vesca) is an attractive system for functional genomics studies. Its small stature, fast regeneration time, efficient transformability and small genome size, together with substantial EST and genomic sequence resources make it an ideal reference plant for Fragaria and other herbaceous perennials. Most importantly, this species shares gene sequence similarity and genomic microcolinearity with other members of the Rosaceae family, including large-statured tree crops (such as apple, peach and cherry), and brambles and roses as well as with the cultivated octoploid strawberry, F. ×ananassa. F. vesca may be used to quickly address questions of gene function relevant to these valuable crop species. Although some F. vesca lines have been shown to be substantially homozygous, in our hands plants in purportedly homozygous populations exhibited a range of morphological and physiological variation, confounding phenotypic analyses. We also found the genotype of a named variety, thought to be well-characterized and even sold commercially, to be in question. An easy to grow, standardized, inbred diploid Fragaria line with documented genotype that is available to all members of the research community will facilitate comparison of results among laboratories and provide the research community with a necessary tool for functionally testing the large amount of sequence data that will soon be available for peach, apple, and strawberry. Results A highly inbred line, YW5AF7, of a diploid strawberry Fragaria vesca f. semperflorens line called "Yellow Wonder" (Y2) was developed and examined. Botanical descriptors were assessed for morphological characterization of this genotype. The plant line was found to be rapidly transformable using established techniques and media formulations. Conclusion The development of the documented YW5AF7 line provides an important tool for Rosaceae functional genomic analyses. These day-neutral plants have a small genome, a seed to seed cycle of 3.0 - 3.5 months, and produce fruit in 7.5 cm pots in a growth chamber. YW5AF7 is runnerless and therefore easy to maintain in the greenhouse, forms abundant branch crowns for vegetative propagation, and produces highly aromatic yellow fruit throughout the year in the greenhouse. F. vesca can be transformed with Agrobacterium tumefaciens, making these plants suitable for insertional mutagenesis, RNAi and overexpression studies that can be compared against a stable baseline of phenotypic descriptors and can be readily genetically substantiated. PMID:19878589
Targeting vector construction through recombineering.
Malureanu, Liviu A
2011-01-01
Gene targeting in mouse embryonic stem cells is an essential, yet still very expensive and highly time-consuming, tool and method to study gene function at the organismal level or to create mouse models of human diseases. Conventional cloning-based methods have been largely used for generating targeting vectors, but are hampered by a number of limiting factors, including the variety and location of restriction enzymes in the gene locus of interest, the specific PCR amplification of repetitive DNA sequences, and cloning of large DNA fragments. Recombineering is a technique that exploits the highly efficient homologous recombination function encoded by λ phage in Escherichia coli. Bacteriophage-based recombination can recombine homologous sequences as short as 30-50 bases, allowing manipulations such as insertion, deletion, or mutation of virtually any genomic region. The large availability of mouse genomic bacterial artificial chromosome (BAC) libraries covering most of the genome facilitates the retrieval of genomic DNA sequences from the bacterial chromosomes through recombineering. This chapter describes a successfully applied protocol and aims to be a detailed guide through the steps of generation of targeting vectors through recombineering.
Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric
2014-04-15
Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
2014-01-01
Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. Results We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. Conclusions We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data. PMID:24735413
Inducible CRISPR genome-editing tool: classifications and future trends.
Dai, Xiaofeng; Chen, Xiao; Fang, Qiuwu; Li, Jia; Bai, Zhonghu
2018-06-01
The discovery of CRISPR-Cas9/dCas9 system has reinforced our ability and revolutionized our history in genome engineering. While Cas9 and dCas9 are programed to modulate gene expression by introducing DNA breaks, blocking transcription factor recruitment or dragging functional groups towards the targeted sites, sgRNAs determine the genomic loci where the modulation occurs. The off-target problem, due to limited sgRNA specificity and genome complexity of many species, has posed concerns for the wide application of this revolutionary technique. To solve this problem and, more importantly, gain power over gene functionality and cell fate control, inducible strategies have been continuously evolved to offer tailored solutions to address specific biological questions. By reviewing recent advances in inducible CRISPR system design and critical elements potentially adding values to such systems, we classify current approaches in this domain into four mechanically distinct categories, namely, "split system", "allosteric system", "combinatorial system", and "transient delivery system", discuss the pros and cons of each system, and point out the under-explored areas and future directions, with the aim of enriching our toolbox of delicate life engineering.
Fichant, Gwennaele; Basse, Marie-Jeanne; Quentin, Yves
2006-03-01
The ATP-binding cassette (ABC) transporters are one of the major classes of active transporters. They are widespread in archaea, bacteria, and eukaryota, indicating that they have arisen early in evolution. They are involved in many essential physiological processes, but the majority import or export a wide variety of compounds across cellular membranes. These systems share a common architecture composed of four (exporters) or five (importers) domains. To identify and reconstruct functional ABC transporters encoded by archaeal and bacterial genomes, we have developed a bioinformatic strategy. Cross-reference to the transport classification system is used to predict the type of compound transported. A high quality of annotation is achieved by manual verification of the predictions. However, in order to face the rapid increase in the number of published genomes, we also include analyses of genomes issuing directly from the automated strategy. Querying the database (http://www-abcdb.biotoul.fr) allows to easily retrieve ABC transporter repertories and related data. Additional query tools have been developed for the analysis of the ABC family from both functional and evolutionary perspectives.
A Plant-Associated Microbe Genome Initiative
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jan E. Leach; Scott Gold; Sue Tolin
2003-03-06
Plant-associated microorganisms are critical to agricultural and food security and are key components in maintaining the balance of our ecosystems. Some of these diverse microbes, which include viruses, bacteria, oomycetes, fungi, and nematodes, cause plant diseases, whereas others prevent diseases or enhance plant growth. Despite their importance, we know little about them on a genomic level. To intervene in disease and understand the basis of biological control or symbiotic relationships, a concerted and coordinated genomic analysis of these microbes is essential. Genome analysis, in this context, refers to the structural and functional analysis of the microbe DNA including the genes,more » the proteins encoded by those genes, as well as noncoding sequences involved in genome dynamics and function. The ultimate emphasis is on understanding genomic functions involved in plant associations. Members of The American Phytopathological Society (APS) developed a prioritized list of plant-associated microbes for genome analysis. With this list as a foundation for discussions, a Workshop on Genomic Analysis of Plant-Associated Microorganisms was held in Washington, D.C., on 9 to 11 April 2002. The workshop was organized by the Public Policy Board of APS, and was funded by the Department of Energy (DOE), the National Science Foundation (NSF), U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), and USDA-National Research Initiatives (USDA-NRI). The workshop included academic, industrial, and governmental experts from the genomics and microbial research communities and observers from the federal funding agencies. After reviewing current and near-term technologies, workshop participants proposed a comprehensive, international initiative to obtain the genomic information needed to understand these important microbes and their interactions with host plants and the environment. Specifically, the recommendations call for a 5-year, $500 million international public effort for genome analysis of plant-associated microbes. The goals are to (i) obtain genome sequence information for several representative groups of microbes; (ii) identify and determine function for the genes/proteins and other genomic elements involved in plant-microbe interactions; (iii) develop and implement standardized bioinformatic tools and a database system that is applicable across all microbes; and (iv) educate and train scientists with skills and knowledge of biological and computational sciences who will apply the information to the protection of our food sources and environment.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, Patrick
Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.
Paridaens, Tom; Van Wallendael, Glenn; De Neve, Wesley; Lambert, Peter
2017-05-15
The past decade has seen the introduction of new technologies that lowered the cost of genomic sequencing increasingly. We can even observe that the cost of sequencing is dropping significantly faster than the cost of storage and transmission. The latter motivates a need for continuous improvements in the area of genomic data compression, not only at the level of effectiveness (compression rate), but also at the level of functionality (e.g. random access), configurability (effectiveness versus complexity, coding tool set …) and versatility (support for both sequenced reads and assembled sequences). In that regard, we can point out that current approaches mostly do not support random access, requiring full files to be transmitted, and that current approaches are restricted to either read or sequence compression. We propose AFRESh, an adaptive framework for no-reference compression of genomic data with random access functionality, targeting the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a Context-Adaptive Binary Arithmetic Coding scheme (CABAC), to compress raw genetic codes. To the best of our knowledge, our paper is the first to describe an effective implementation CABAC outside of its' original application. By applying CABAC, the compression effectiveness improves by up to 19% for assembled sequences and up to 62% for reads. By applying AFRESh to the genomic symbols of the MPEG genomic compression test set for reads, a compression gain is achieved of up to 51% compared to SCALCE, 42% compared to LFQC and 44% compared to ORCOM. When comparing to generic compression approaches, a compression gain is achieved of up to 41% compared to GNU Gzip and 22% compared to 7-Zip at the Ultra setting. Additionaly, when compressing assembled sequences of the Human Genome, a compression gain is achieved up to 34% compared to GNU Gzip and 16% compared to 7-Zip at the Ultra setting. A Windows executable version can be downloaded at https://github.com/tparidae/AFresh . tom.paridaens@ugent.be. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Karp, Peter D; Paley, Suzanne; Romero, Pedro
2002-01-01
Bioinformatics requires reusable software tools for creating model-organism databases (MODs). The Pathway Tools is a reusable, production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users.