Science.gov

Sample records for aspergillus genome database

  1. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    PubMed

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  2. The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources.

    PubMed

    Arnaud, Martha B; Cerqueira, Gustavo C; Inglis, Diane O; Skrzypek, Marek S; Binkley, Jonathan; Chibucos, Marcus C; Crabtree, Jonathan; Howarth, Clinton; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin; Wortman, Jennifer R

    2012-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at aspergillus-curator@lists.stanford.edu.

  3. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  4. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  5. Querying genomic databases

    SciTech Connect

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  6. Comparative Reannotation of 21 Aspergillus Genomes

    SciTech Connect

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  7. Mouse genome database 2016.

    PubMed

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data.

  8. Mouse genome database 2016

    PubMed Central

    Bult, Carol J.; Eppig, Janan T.; Blake, Judith A.; Kadin, James A.; Richardson, Joel E.

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  9. Mouse genome database 2016.

    PubMed

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  10. Clinical Genomic Database

    PubMed Central

    Solomon, Benjamin D.; Nguyen, Anh-Dao; Bear, Kelly A.; Wolfsberg, Tyra G.

    2013-01-01

    Technological advances have greatly increased the availability of human genomic sequencing. However, the capacity to analyze genomic data in a clinically meaningful way lags behind the ability to generate such data. To help address this obstacle, we reviewed all conditions with genetic causes and constructed the Clinical Genomic Database (CGD) (http://research.nhgri.nih.gov/CGD/), a searchable, freely Web-accessible database of conditions based on the clinical utility of genetic diagnosis and the availability of specific medical interventions. The CGD currently includes a total of 2,616 genes organized clinically by affected organ systems and interventions (including preventive measures, disease surveillance, and medical or surgical interventions) that could be reasonably warranted by the identification of pathogenic mutations. To aid independent analysis and optimize new data incorporation, the CGD also includes all genetic conditions for which genetic knowledge may affect the selection of supportive care, informed medical decision-making, prognostic considerations, reproductive decisions, and allow avoidance of unnecessary testing, but for which specific interventions are not otherwise currently available. For each entry, the CGD includes the gene symbol, conditions, allelic conditions, clinical categorization (for both manifestations and interventions), mode of inheritance, affected age group, description of interventions/rationale, links to other complementary databases, including databases of variants and presumed pathogenic mutations, and links to PubMed references (>20,000). The CGD will be regularly maintained and updated to keep pace with scientific discovery. Further content-based expert opinions are actively solicited. Eventually, the CGD may assist the rapid curation of individual genomes as part of active medical care. PMID:23696674

  11. Genomic Islands in Pathogenic Filamentous Fungus Aspergillus fumigatus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present the genome sequences of a new clinical isolate, CEA10, of an important human pathogen, Aspergillus fumigatus, and two closely related, but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of CEA10 with the recently sequen...

  12. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  13. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  14. HeteroGenome: database of genome periodicity

    PubMed Central

    Chaley, Maria; Kutyrkin, Vladimir; Tulbasheva, Gayane; Teplukhina, Elena; Nazipova, Nafisa

    2014-01-01

    We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome. Database URL: http://www.jcbi.ru/lp_baze/ PMID:24857969

  15. GOLD: The Genomes Online Database

    DOE Data Explorer

    Kyrpides, Nikos; Liolios, Dinos; Chen, Amy; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor; Bernal, Alex

    Since its inception in 1997, GOLD has continuously monitored genome sequencing projects worldwide and has provided the community with a unique centralized resource that integrates diverse information related to Archaea, Bacteria, Eukaryotic and more recently Metagenomic sequencing projects. As of September 2007, GOLD recorded 639 completed genome projects. These projects have their complete sequence deposited into the public archival sequence databases such as GenBank EMBL,and DDBJ. From the total of 639 complete and published genome projects as of 9/2007, 527 were bacterial, 47 were archaeal and 65 were eukaryotic. In addition to the complete projects, there were 2158 ongoing sequencing projects. 1328 of those were bacterial, 59 archaeal and 771 eukaryotic projects. Two types of metadata are provided by GOLD: (i) project metadata and (ii) organism/environment metadata. GOLD CARD pages for every project are available from the link of every GOLD_STAMP ID. The information in every one of these pages is organized into three tables: (a) Organism information, (b) Genome project information and (c) External links. [The Genomes On Line Database (GOLD) in 2007: Status of genomic and metagenomic projects and their associated metadata, Konstantinos Liolios, Konstantinos Mavromatis, Nektarios Tavernarakis and Nikos C. Kyrpides, Nucleic Acids Research Advance Access published online on November 2, 2007, Nucleic Acids Research, doi:10.1093/nar/gkm884]

    The basic tables in the GOLD database that can be browsed or searched include the following information:

    • Gold Stamp ID
    • Organism name
    • Domain
    • Links to information sources
    • Size and link to a map, when available
    • Chromosome number, Plas number, and GC content
    • A link for downloading the actual genome data
    • Institution that did the sequencing
    • Funding source
    • Database where information resides
    • Publication status and information

    • What can comparative genomics tell us about species concepts in the genus Aspergillus?

      SciTech Connect

      Rokas, Antonis; payne, gary; Federova, Natalie D.; Baker, Scott E.; Machida, Masa; yu, Jiujiang; georgianna, D. R.; Dean, Ralph A.; Bhatnagar, Deepak; Cleveland, T. E.; Wortman, Jennifer R.; Maiti, R.; Joardar, V.; Amedeo, Paolo; Denning, David W.; Nierman, William C.

      2007-12-15

      Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A. flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.

    • The 2008 update of the Aspergillus nidulans genome annotation: a community effort.

      PubMed

      Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J M; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W J; Hansen, Kim; Harris, Steven D; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E; Kiel, Jan A K W; Kim, Jung-Mi; van der Klei, Ida J; Klis, Frans M; Kovalchuk, Andriy; Krasevec, Nada; Kubicek, Christian P; Liu, Bo; Maccabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R; Nielsen, Jens; Oakley, Berl R; Osmani, Stephen A; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J; Ram, Arthur F J; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; van Solingen, Piet; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A; Visser, Hans; van de Vondervoort, Peter J I; de Vries, Ronald P; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W; Cornell, Michael J; van den Hondel, Cees A M J J; Visser, Jacob; Oliver, Stephen G; Turner, Geoffrey

      2009-03-01

      The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology.

    • The 2008 update of the Aspergillus nidulans genome annotation: a community effort

      PubMed Central

      Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R.; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J.M.; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W.J.; Hansen, Kim; Harris, Steven D.; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E.; Kiel, Jan A.K.W.; Kim, Jung-Mi; van der Klei, Ida J.; Klis, Frans M.; Kovalchuk, Andriy; Kraševec, Nada; Kubicek, Christian P.; Liu, Bo; MacCabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R.; Nielsen, Jens; Oakley, Berl R.; Osmani, Stephen A.; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J.; Ram, Arthur F.J.; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; Solingen, Piet van; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A.; Visser, Hans; van de Vondervoort, Peter J.I.; de Vries, Ronald P.; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W.; Cornell, Michael J.; van den Hondel, Cees A.M.J.J.; Visser, Jacob; Oliver, Stephen G.; Turner, Geoffrey

      2010-01-01

      The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology. PMID:19146970

    • The YH database: the first Asian diploid genome database.

      PubMed

      Li, Guoqing; Ma, Lijia; Song, Chao; Yang, Zhentao; Wang, Xiulan; Huang, Hui; Li, Yingrui; Li, Ruiqiang; Zhang, Xiuqing; Yang, Huanming; Wang, Jian; Wang, Jun

      2009-01-01

      The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.

    • Searching and Indexing Genomic Databases via Kernelization

      PubMed Central

      Gagie, Travis; Puglisi, Simon J.

      2015-01-01

      The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

  1. Aspergillus Niger Genomics: Past, Present and into the Future

    SciTech Connect

    Baker, Scott E.

    2006-09-01

    Aspergillus niger is a filamentous ascomycete fungus that is ubiquitous in the environment and has been implicated in opportunistic infections of humans. In addition to its role as an opportunistic human pathogen, A. niger is economically important as a fermentation organism used for the production of citric acid. Industrial citric acid production by A. niger represents one of the most efficient, highest yield bioprocesses in use currently by industry. The genome size of A. niger is estimated to be between 35.5 and 38.5 megabases (Mb) divided among eight chromosomes/linkage groups that vary in size from 3.5 - 6.6 Mb. Currently, there are three independent A. niger genome projects, an indication of the economic importance of this organism. The rich amount of data resulting from these multiple A. niger genome sequences will be used for basic and applied research programs applicable to fermentation process development, morphology and pathogenicity.

  2. Genomic Islands in the Pathogenic Filamentous Fungus Aspergillus fumigatus

    PubMed Central

    Fedorova, Natalie D.; Khaldi, Nora; Joardar, Vinita S.; Maiti, Rama; Amedeo, Paolo; Anderson, Michael J.; Crabtree, Jonathan; Silva, Joana C.; Badger, Jonathan H.; Albarraq, Ahmed; Angiuoli, Sam; Bussey, Howard; Bowyer, Paul; Cotty, Peter J.; Dyer, Paul S.; Egan, Amy; Galens, Kevin; Fraser-Liggett, Claire M.; Haas, Brian J.; Inman, Jason M.; Kent, Richard; Lemieux, Sebastien; Malavazi, Iran; Orvis, Joshua; Roemer, Terry; Ronning, Catherine M.; Sundaram, Jaideep P.; Sutton, Granger; Turner, Geoff; Venter, J. Craig; White, Owen R.; Whitty, Brett R.; Youngman, Phil; Wolfe, Kenneth H.; Goldman, Gustavo H.; Wortman, Jennifer R.; Jiang, Bo; Denning, David W.; Nierman, William C.

    2008-01-01

    We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated “gene dumps” and, perhaps, simultaneously, as “gene factories”. PMID:18404212

  3. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

    PubMed Central

    Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  4. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    PubMed

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

  5. Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

    PubMed

    Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search.

  6. Metabolic peculiarities of Aspergillus niger disclosed by comparative metabolic genomics

    PubMed Central

    Sun, Jibin; Lu, Xin; Rinas, Ursula; Zeng, An Ping

    2007-01-01

    Background Aspergillus niger is an important industrial microorganism for the production of both metabolites, such as citric acid, and proteins, such as fungal enzymes or heterologous proteins. Despite its extensive industrial applications, the genetic inventory of this fungus is only partially understood. The recently released genome sequence opens a new horizon for both scientific studies and biotechnological applications. Results Here, we present the first genome-scale metabolic network for A. niger and an in-depth genomic comparison of this species to seven other fungi to disclose its metabolic peculiarities. The raw genomic sequences of A. niger ATCC 9029 were first annotated. The reconstructed metabolic network is based on the annotation of two A. niger genomes, CBS 513.88 and ATCC 9029, including enzymes with 988 unique EC numbers, 2,443 reactions and 2,349 metabolites. More than 1,100 enzyme-coding genes are unique to A. niger in comparison to the other seven fungi. For example, we identified additional copies of genes such as those encoding alternative mitochondrial oxidoreductase and citrate synthase in A. niger, which might contribute to the high citric acid production efficiency of this species. Moreover, nine genes were identified as encoding enzymes with EC numbers exclusively found in A. niger, mostly involved in the biosynthesis of complex secondary metabolites and degradation of aromatic compounds. Conclusion The genome-level reconstruction of the metabolic network and genome-based metabolic comparison disclose peculiarities of A. niger highly relevant to its biotechnological applications and should contribute to future rational metabolic design and systems biology studies of this black mold and related species. PMID:17784953

  7. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    DOE PAGESBeta

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

  8. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    PubMed Central

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

    2015-01-01

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involved in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis. PMID:25765923

  9. The UCSC Genome Browser database: 2015 update.

    PubMed

    Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.

  10. ChloroplastDB: the Chloroplast Genome Database.

    PubMed

    Cui, Liying; Veeraraghavan, Narayanan; Richter, Alexander; Wall, Kerr; Jansen, Robert K; Leebens-Mack, Jim; Makalowska, Izabela; dePamphilis, Claude W

    2006-01-01

    The Chloroplast Genome Database (ChloroplastDB) is an interactive, web-based database for fully sequenced plastid genomes, containing genomic, protein, DNA and RNA sequences, gene locations, RNA-editing sites, putative protein families and alignments (http://chloroplast.cbio.psu.edu/). With recent technical advances, the rate of generating new organelle genomes has increased dramatically. However, the established ontology for chloroplast genes and gene features has not been uniformly applied to all chloroplast genomes available in the sequence databases. For example, annotations for some published genome sequences have not evolved with gene naming conventions. ChloroplastDB provides unified annotations, gene name search, BLAST and download functions for chloroplast encoded genes and genomic sequences. A user can retrieve all orthologous sequences with one search regardless of gene names in GenBank. This feature alone greatly facilitates comparative research on sequence evolution including changes in gene content, codon usage, gene structure and post-transcriptional modifications such as RNA editing. Orthologous protein sets are classified by TribeMCL and each set is assigned a standard gene name. Over the next few years, as the number of sequenced chloroplast genomes increases rapidly, the tools available in ChloroplastDB will allow researchers to easily identify and compile target data for comparative analysis of chloroplast genes and genomes.

  11. Genome Statute and Legislation Database

    MedlinePlus

    ... of page Last Reviewed: February 29, 2016 Get Email Updates Advancing human health through genomics research Privacy Copyright Contact Accessibility Plug-ins Site Map Staff Directory FOIA Share Top

  12. The Organelle Genome Database Project (GOBASE).

    PubMed Central

    Korab-Laskowska, M; Rioux, P; Brossard, N; Littlejohn, T G; Gray, M W; Lang, B F; Burger, G

    1998-01-01

    The taxonomically broad organelle genome database (GOBASE) organizes and integrates diverse data related to organelles (mitochondria and chloroplasts). The current version of GOBASE focuses on the mitochondrial subset of data and contains molecular sequences, RNA secondary structures and genetic maps, as well as taxonomic information for all eukaryotic species represented. The database has been designed so that complex biological queries, especially ones posed in a comparative genomics context, are supported. GOBASE has been implemented as a relational database with a web-based user interface (http://megasun.bch.umontreal.ca/gobase/gobas e.html ). Custom software tools have been written in house to assist in the population of the database, data validation, nomenclature standardization and front-end design. The database is fully operational and publicly accessible via the World Wide Web, allowing interactive browsing, sophisticated searching and easy downloading of data. PMID:9399818

  13. BGD: A Database of Bat Genomes

    PubMed Central

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/. PMID:26110276

  14. BGD: a database of bat genomes.

    PubMed

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  15. Complete mitochondrial genome of an Amynthas earthworm, Amynthas aspergillus (Oligochaeta: Megascolecidae).

    PubMed

    Zhang, Liangliang; Jiang, Jibao; Dong, Yan; Qiu, Jiangping

    2016-05-01

    We have determined the mitochondrial genome of the first Amynthas earthworm, Amynthas aspergillus (Perrier, 1872), which is a natural medical resource in Chinese traditional medicine. Its mitogenome is 15,115 bp in length containing 37 genes with the same contents and order as other sequenced earthworms. All genes are encoded by the same strand, all 13 PCGs use ATG as start codon. The content of A + T is 63.04% for A. aspergillus (33.41% A, 29.63% T, 14.56% G and 22.41% C). The complete mitochondrial genomes of A. aspergillus would be useful for the reconstruction of Oligochaeta polygenetic relationships.

  16. The UCSC Genome Browser database: 2016 update.

    PubMed

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

  17. The UCSC Genome Browser database: 2016 update.

    PubMed

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  18. The Saccharomyces Genome Database Variant Viewer

    PubMed Central

    Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  19. The UCSC Genome Browser database: 2014 update.

    PubMed

    Karolchik, Donna; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hinrichs, Angie S; Learned, Katrina; Lee, Brian T; Li, Chin H; Raney, Brian J; Rhead, Brooke; Rosenbloom, Kate R; Sloan, Cricket A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2014-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.

  20. The UCSC Genome Browser database: 2014 update

    PubMed Central

    Karolchik, Donna; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S.; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hinrichs, Angie S.; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Raney, Brian J.; Rhead, Brooke; Rosenbloom, Kate R.; Sloan, Cricket A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2014-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser’s web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation ‘tracks’ for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany. PMID:24270787

  1. Saccharomyces genome database: Underlying principles and organisation

    PubMed Central

    Dwight, Selina S.; Balakrishnan, Rama; Christie, Karen R.; Costanzo, Maria C.; Dolinski, Kara; Engel, Stacia R.; Feierbach, Becket; Fisk, Dianna G.; Hirschman, Jodi; Hong, Eurie L.; Issel-Tarver, Laurie; Nash, Robert S.; Sethuraman, Anand; Starr, Barry; Theesfeld, Chandra L.; Andrada, Rey; Binkley, Gail; Dong, Qing; Lane, Christopher; Schroeder, Mark; Weng, Shuai; Botstein, David; Cherry, J. Michael

    2011-01-01

    A scientific database can be a powerful tool for biologists in an era where large-scale genomic analysis, combined with smaller-scale scientific results, provides new insights into the roles of genes and their products in the cell. However, the collection and assimilation of data is, in itself, not enough to make a database useful. The data must be incorporated into the database and presented to the user in an intuitive and biologically significant manner. Most importantly, this presentation must be driven by the user’s point of view; that is, from a biological perspective. The success of a scientific database can therefore be measured by the response of its users – statistically, by usage numbers and, in a less quantifiable way, by its relationship with the community it serves and its ability to serve as a model for similar projects. Since its inception ten years ago, the Saccharomyces Genome Database (SGD) has seen a dramatic increase in its usage, has developed and maintained a positive working relationship with the yeast research community, and has served as a template for at least one other database. The success of SGD, as measured by these criteria, is due in large part to philosophies that have guided its mission and organisation since it was established in 1993. This paper aims to detail these philosophies and how they shape the organisation and presentation of the database. PMID:15153302

  2. Sequence resources at the Candida Genome Database.

    PubMed

    Arnaud, Martha B; Costanzo, Maria C; Skrzypek, Marek S; Shah, Prachi; Binkley, Gail; Lane, Christopher; Miyasato, Stuart R; Sherlock, Gavin

    2007-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) contains a curated collection of genomic information and community resources for researchers who are interested in the molecular biology of the opportunistic pathogen Candida albicans. With the recent release of a new assembly of the C.albicans genome, Assembly 20, C.albicans genomics has entered a new era. Although the C.albicans genome assembly continues to undergo refinement, multiple assemblies and gene nomenclatures will remain in widespread use by the research community. CGD has now taken on the responsibility of maintaining the most up-to-date version of the genome sequence by providing the data from this new assembly alongside the data from the previous assemblies, as well as any future corrections and refinements. In this database update, we describe the sequence information available for C.albicans, the sequence information contained in CGD, and the tools for sequence retrieval, analysis and comparison that CGD provides. CGD is freely accessible at http://www.candidagenome.org/ and CGD curators may be contacted by email at candida-curator@genome.stanford.edu.

  3. The UCSC Genome Browser database: 2015 update

    PubMed Central

    Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Miga, Karen H.; Nguyen, Ngan; Paten, Benedict; Raney, Brian J.; Smit, Arian F. A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  4. The phenotypic and genomic diversity of Aspergillus strains producing glucose dehydrogenase.

    PubMed

    Rola, Beata; Pawlik, Anna; Frąc, Magdalena; Małek, Wanda; Targoński, Zdzisław; Rogalski, Jerzy; Janusz, Grzegorz

    2015-01-01

    Twelve Aspergillus sp. strains producing glucose dehydrogenase were identified using ITS region sequencing. Based on the sequences obtained, the genomic relationship of the analyzed strains was investigated. Moreover, partial gdh gene sequences were determined and aligned. The amplified fragment length polymorphism (AFLP) method was applied for genomic fingerprinting of twelve Aspergillus isolates. Using one PstI restriction endonuclease and five selective primers in an AFLP assay, 556 DNA fragments were generated, including 532 polymorphic bands. The AFLP profiles were found to be highly specific for each strain and they unambiguously distinguished twelve Aspergilli fungi. The AFLP-based dendrogram generated by the UPGMA method grouped all the Aspergillus fungi studied into two major clusters. All the Aspergillus strains were also characterized using Biolog FF MicroPlates to obtain data on C-substrate utilization and mitochondrial activity. The ability to decompose various substrates differed among the analyzed strains up to three folds. All of the studied strains mainly decomposed carbohydrates.

  5. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  6. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource. PMID:26223881

  7. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc.

  8. The Saccharomyces Genome Database: Exploring Genome Features and Their Annotations.

    PubMed

    Cherry, J Michael

    2015-12-01

    Genomic-scale assays result in data that provide information over the entire genome. Such base pair resolution data cannot be summarized easily except via a graphical viewer. A genome browser is a tool that displays genomic data and experimental results as horizontal tracks. Genome browsers allow searches for a chromosomal coordinate or a feature, such as a gene name, but they do not allow searches by function or upstream binding site. Entry into a genome browser requires that you identify the gene name or chromosomal coordinates for a region of interest. A track provides a representation for genomic results and is displayed as a row of data shown as line segments to indicate regions of the chromosome with a feature. Another type of track presents a graph or wiggle plot that indicates the processed signal intensity computed for a particular experiment or set of experiments. Wiggle plots are typical for genomic assays such as the various next-generation sequencing methods (e.g., chromatin immunoprecipitation [ChIP]-seq or RNA-seq), where it represents a peak of DNA binding, histone modification, or the mapping of an RNA sequence. Here we explore the browser that has been built into the Saccharomyces Genome Database (SGD).

  9. The mouse genome informatics and the mouse genome database

    SciTech Connect

    Maltais, L.J.; Blackburn, R.E.; Bradt, D.W.

    1994-09-01

    The Mouse Genome Database (MGD) is a centralized, comprehensive database of the mouse genome that includes genetic mapping data, comparative mapping data, gene descriptions, mutant phenotype descriptions, strains and allelic polymorphism data, inbred strain characteristics, physical mapping data, and molecular probes and clones data. Data in MGD are obtained from the published literature and by electronic transfer from laboratories working on large backcross panels of mice. MGD provides tools that enable the user to search the database, retrieve data, generate reports, analyze data, annotate records, and build genetic maps. The Encyclopedia of the Mouse Genome provides a graphic user interface to mouse genome data. It consists of software tools including: LinkMap, a graphic display of genetic linkage maps with the ability to magnify regions of high locus density: CytoMap, a graphic display of cytogenetic maps showing banded chromosomes with cytogenetic locations of genes and chromosomal aberrations; CATS, a catalog searching tool for text retrieval of mouse locus descriptions. These software tools provide access to the following data sets: Chromosome Committee Reports, MIT Genome Center data, GBASE reports, Mouse Locus Catalog (MLC), and Mouse Cytogenetic Mapping Data. The MGD is available to the scientific community through the World Wide Web (WWW) and Gopher. In addition GBASE can be accessed via the Internet.

  10. Saccharomyces Genome Database: the genomics resource of budding yeast

    PubMed Central

    Cherry, J. Michael; Hong, Eurie L.; Amundsen, Craig; Balakrishnan, Rama; Binkley, Gail; Chan, Esther T.; Christie, Karen R.; Costanzo, Maria C.; Dwight, Selina S.; Engel, Stacia R.; Fisk, Dianna G.; Hirschman, Jodi E.; Hitz, Benjamin C.; Karra, Kalpana; Krieger, Cynthia J.; Miyasato, Stuart R.; Nash, Rob S.; Park, Julie; Skrzypek, Marek S.; Simison, Matt; Weng, Shuai; Wong, Edith D.

    2012-01-01

    The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use. PMID:22110037

  11. Saccharomyces Genome Database: the genomics resource of budding yeast.

    PubMed

    Cherry, J Michael; Hong, Eurie L; Amundsen, Craig; Balakrishnan, Rama; Binkley, Gail; Chan, Esther T; Christie, Karen R; Costanzo, Maria C; Dwight, Selina S; Engel, Stacia R; Fisk, Dianna G; Hirschman, Jodi E; Hitz, Benjamin C; Karra, Kalpana; Krieger, Cynthia J; Miyasato, Stuart R; Nash, Rob S; Park, Julie; Skrzypek, Marek S; Simison, Matt; Weng, Shuai; Wong, Edith D

    2012-01-01

    The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use. PMID:22110037

  12. Orchidstra: An Integrated Orchid Functional Genomics Database

    PubMed Central

    Su, Chun-lin; Chao, Ya-Ting; Yen, Shao-Hua; Chen, Chun-Yi; Chen, Wan-Chieh; Chang, Yao-Chien Alex; Shih, Ming-Che

    2013-01-01

    A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species. PMID:23324169

  13. Requirements and standards for organelle genome databases

    SciTech Connect

    Boore, Jeffrey L.

    2006-01-09

    Mitochondria and plastids (collectively called organelles)descended from prokaryotes that adopted an intracellular, endosymbioticlifestyle within early eukaryotes. Comparisons of their remnant genomesaddress a wide variety of biological questions, especially when includingthe genomes of their prokaryotic relatives and the many genes transferredto the eukaryotic nucleus during the transitions from endosymbiont toorganelle. The pace of producing complete organellar genome sequences nowmakes it unfeasible to do broad comparisons using the primary literatureand, even if it were feasible, it is now becoming uncommon for journalsto accept detailed descriptions of genome-level features. Unfortunatelyno database is currently useful for this task, since they have littlestandardization and are riddled with error. Here I outline what iscurrently wrong and what must be done to make this data useful to thescientific community.

  14. How to use the Candida Genome Database

    PubMed Central

    Skrzypek, Marek S.; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Summary Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  15. How to Use the Candida Genome Database.

    PubMed

    Skrzypek, Marek S; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD.

  16. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    PubMed Central

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  17. Potential of Aspergillus flavus Genomics for Applications in Biotechnology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a common saprophyte and opportunistic pathogen that survives in the natural environment by extracting nutrition from plant debris, insect carcasses and a variety of other carbon sources. A. flavus produces numerous secondary metabolites and hydrolytic enzymes. The primary obj...

  18. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station.

    PubMed

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  19. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station.

    PubMed

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri

    2016-07-14

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth.

  20. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station

    PubMed Central

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  1. Advances in Aspergillus secondary metabolite research in the post-genomic era

    PubMed Central

    Sanchez, James F.; Somoza, Amber D.; Keller, Nancy P.

    2015-01-01

    This review studies the impact of whole genome sequencing on Aspergillus secondary metabolite research. There has been a proliferation of many new, intriguing discoveries since sequencing data became widely available. What is more, the genomes disclosed the surprising finding that there are many more secondary metabolite biosynthetic pathways than laboratory research had suggested. Activating these pathways has been met with some success, but many more dormant genes remain to be awakened. PMID:22228366

  2. u-Genome: a database on genome design in unicellular genomes.

    PubMed

    Sakharkar, Kishore Ramaji; Chaturvedi, Iti; Chow, Vincent T K; Kwoh, Chee Keong; Kangueane, Pandjassarame; Sakharkar, Meena Kishore

    2005-01-01

    Unicellular eukaryotes were among the first ones to be selected for complete genome sequencing because of the small size of their genomes and their interactions with humans and a broad range of animals and plants. Currently, ten completely sequenced unicellular genome sequences have been publicly released and as the number of available unicellular genomes increases, comparative genomics analysis within this group of organisms becomes more and more instructive. However, such an analysis is difficult to carry out without a suitable platform gathering not only the original annotations but also relevant information available in public databases or obtained by applying common bioinformatics methods. With the aim of solving these difficulties, we have developed a web-accessible database named u-Genome, the unicellular genome design database. The database is unique in featuring three datasets namely (1) orthologous proteins (2) paralogous proteins and (3) statistical distributions on exons, introns, intergenic DNA and correlations between them. A tool, Uniview, designed to visualize the gene structures for individual genes in the genome is also integrated. This database is of importance in understanding unicellular genome design and architecture and evolution related studies. The database is available through a web interface at http://sege.ntu.edu.sg/wester/ugenome.

  3. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  4. MTGD: The Medicago truncatula genome database.

    PubMed

    Krishnakumar, Vivek; Kim, Maria; Rosen, Benjamin D; Karamycheva, Svetlana; Bidwell, Shelby L; Tang, Haibao; Town, Christopher D

    2015-01-01

    Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant 'mines' such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.

  5. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...

  6. The phenotypic and genomic diversity of Aspergillus strains producing glucose dehydrogenase.

    PubMed

    Rola, Beata; Pawlik, Anna; Frąc, Magdalena; Małek, Wanda; Targoński, Zdzisław; Rogalski, Jerzy; Janusz, Grzegorz

    2015-01-01

    Twelve Aspergillus sp. strains producing glucose dehydrogenase were identified using ITS region sequencing. Based on the sequences obtained, the genomic relationship of the analyzed strains was investigated. Moreover, partial gdh gene sequences were determined and aligned. The amplified fragment length polymorphism (AFLP) method was applied for genomic fingerprinting of twelve Aspergillus isolates. Using one PstI restriction endonuclease and five selective primers in an AFLP assay, 556 DNA fragments were generated, including 532 polymorphic bands. The AFLP profiles were found to be highly specific for each strain and they unambiguously distinguished twelve Aspergilli fungi. The AFLP-based dendrogram generated by the UPGMA method grouped all the Aspergillus fungi studied into two major clusters. All the Aspergillus strains were also characterized using Biolog FF MicroPlates to obtain data on C-substrate utilization and mitochondrial activity. The ability to decompose various substrates differed among the analyzed strains up to three folds. All of the studied strains mainly decomposed carbohydrates. PMID:26634230

  7. Recent advances in genome mining of secondary metabolites in Aspergillus terreus

    PubMed Central

    Guo, Chun-Jun; Wang, Clay C. C.

    2014-01-01

    Filamentous fungi are rich resources of secondary metabolites (SMs) with a variety of interesting biological activities. Recent advances in genome sequencing and techniques in genetic manipulation have enabled researchers to study the biosynthetic genes of these SMs. Aspergillus terreus is the well-known producer of lovastatin, a cholesterol-lowering drug. This fungus also produces other SMs, including acetylaranotin, butyrolactones, and territram, with interesting bioactivities. This review will cover recent progress in genome mining of SMs identified in this fungus. The identification and characterization of the gene cluster for these SMs, as well as the proposed biosynthetic pathways, will be discussed in depth. PMID:25566227

  8. VCGDB: a dynamic genome database of the Chinese population

    PubMed Central

    2014-01-01

    Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222

  9. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  10. Recent updates and developments to plant genome size databases.

    PubMed

    Garcia, Sònia; Leitch, Ilia J; Anadon-Rosell, Alba; Canela, Miguel Á; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols.

  11. Draft Genome Sequences of Two Closely Related Aflatoxigenic Aspergillus Species Obtained from the Ivory Coast.

    PubMed

    Moore, Geromy G; Mack, Brian M; Beltz, Shannon B

    2015-12-03

    Aspergillus ochraceoroseus and Aspergillus rambellii were isolated from soil detritus in Taï National Park, Ivory Coast, Africa. The Type strain for each species happens to be the only representative ever sampled. Both species secrete copious amounts of aflatoxin B1 and sterigmatocystin, because each of their genomes contains clustered genes for biosynthesis of these mycotoxins. We sequenced their genomes using a personal genome machine and found them to be smaller in size (A. ochraceoroseus = 23.9 Mb and A. rambellii = 26.1 Mb), as well as in numbers of predicted genes (7,837 and 7,807, respectively), compared to other sequenced Aspergilli. Our findings also showed that the A. ochraceoroseus Type strain contains a single MAT1-1 gene, while the Type strain of A. rambellii contains a single MAT1-2 gene, indicating that these species are heterothallic (self-infertile). These draft genomes will be useful for understanding the genes and pathways necessary for the cosynthesis of these two toxic secondary metabolites as well as the evolution of these pathways in aflatoxigenic fungi.

  12. Draft Genome Sequences of Two Closely Related Aflatoxigenic Aspergillus Species Obtained from the Ivory Coast.

    PubMed

    Moore, Geromy G; Mack, Brian M; Beltz, Shannon B

    2016-03-01

    Aspergillus ochraceoroseus and Aspergillus rambellii were isolated from soil detritus in Taï National Park, Ivory Coast, Africa. The Type strain for each species happens to be the only representative ever sampled. Both species secrete copious amounts of aflatoxin B1 and sterigmatocystin, because each of their genomes contains clustered genes for biosynthesis of these mycotoxins. We sequenced their genomes using a personal genome machine and found them to be smaller in size (A. ochraceoroseus = 23.9 Mb and A. rambellii = 26.1 Mb), as well as in numbers of predicted genes (7,837 and 7,807, respectively), compared to other sequenced Aspergilli. Our findings also showed that the A. ochraceoroseus Type strain contains a single MAT1-1 gene, while the Type strain of A. rambellii contains a single MAT1-2 gene, indicating that these species are heterothallic (self-infertile). These draft genomes will be useful for understanding the genes and pathways necessary for the cosynthesis of these two toxic secondary metabolites as well as the evolution of these pathways in aflatoxigenic fungi. PMID:26637470

  13. An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks.

    PubMed

    Stelzer, Michael; Sun, Jibin; Kamphans, Tom; Fekete, Sándor P; Zeng, An-Ping

    2011-11-01

    The bioreaction database established by Ma and Zeng (Bioinformatics, 2003, 19, 270-277) for in silico reconstruction of genome-scale metabolic networks has been widely used. Based on more recent information in the reference databases KEGG LIGAND and Brenda, we upgrade the bioreaction database in this work by almost doubling the number of reactions from 3565 to 6851. Over 70% of the reactions have been manually updated/revised in terms of reversibility, reactant pairs, currency metabolites and error correction. For the first time, 41 spontaneous sugar mutarotation reactions are introduced into the biochemical database. The upgrade significantly improves the reconstruction of genome scale metabolic networks. Many gaps or missing biochemical links can be recovered, as exemplified with three model organisms Homo sapiens, Aspergillus niger, and Escherichia coli. The topological parameters of the constructed networks were also largely affected, however, the overall network structure remains scale-free. Furthermore, we consider the problem of computing biologically feasible shortest paths in reconstructed metabolic networks. We show that these paths are hard to compute and present solutions to find such paths in networks of small and medium size.

  14. Exploration of the Chemical Space of Public Genomic Databases

    EPA Science Inventory

    The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.

  15. Neisseria Base: a comparative genomics database for Neisseria meningitidis.

    PubMed

    Katz, Lee S; Humphrey, Jay C; Conley, Andrew B; Nelakuditi, Viswateja; Kislyuk, Andrey O; Agrawal, Sonia; Jayaraman, Pushkala; Harcourt, Brian H; Olsen-Rasmussen, Melissa A; Frace, Michael; Sharma, Nitya V; Mayer, Leonard W; Jordan, I King

    2011-01-01

    Neisseria meningitidis is an important pathogen, causing life-threatening diseases including meningitis, septicemia and in some cases pneumonia. Genomic studies hold great promise for N. meningitidis research, but substantial database resources are needed to deal with the wealth of information that comes with completely sequenced and annotated genomes. To address this need, we developed Neisseria Base (NBase), a comparative genomics database and genome browser that houses and displays publicly available N. meningitidis genomes. In addition to existing N. meningitidis genome sequences, we sequenced and annotated 19 new genomes using 454 pyrosequencing and the CG-Pipeline genome analysis tool. In total, NBase hosts 27 complete N. meningitidis genome sequences along with their associated annotations. The NBase platform is designed to be scalable, via the underlying database schema and modular code architecture, such that it can readily incorporate new genomes and their associated annotations. The front page of NBase provides user access to these genomes through searching, browsing and downloading. NBase search utility includes BLAST-based sequence similarity searches along with a variety of semantic search options. All genomes can be browsed using a modified version of the GBrowse platform, and a plethora of information on each gene can be viewed using a customized details page. NBase also has a whole-genome comparison tool that yields single-nucleotide polymorphism differences between two user-defined groups of genomes. Using the virulent ST-11 lineage as an example, we demonstrate how this comparative genomics utility can be used to identify novel genomic markers for molecular profiling of N. meningitidis. PMID:21930505

  16. Epidemiological and Genomic Landscape of Azole Resistance Mechanisms in Aspergillus Fungi

    PubMed Central

    Hagiwara, Daisuke; Watanabe, Akira; Kamei, Katsuhiko; Goldman, Gustavo H.

    2016-01-01

    Invasive aspergillosis is a life-threatening mycosis caused by the pathogenic fungus Aspergillus. The predominant causal species is Aspergillus fumigatus, and azole drugs are the treatment of choice. Azole drugs approved for clinical use include itraconazole, voriconazole, posaconazole, and the recently added isavuconazole. However, epidemiological research has indicated that the prevalence of azole-resistant A. fumigatus isolates has increased significantly over the last decade. What is worse is that azole-resistant strains are likely to have emerged not only in response to long-term drug treatment but also because of exposure to azole fungicides in the environment. Resistance mechanisms include amino acid substitutions in the target Cyp51A protein, tandem repeat sequence insertions at the cyp51A promoter, and overexpression of the ABC transporter Cdr1B. Environmental azole-resistant strains harboring the association of a tandem repeat sequence and punctual mutation of the Cyp51A gene (TR34/L98H and TR46/Y121F/T289A) have become widely disseminated across the world within a short time period. The epidemiological data also suggests that the number of Aspergillus spp. other than A. fumigatus isolated has risen. Some non-fumigatus species intrinsically show low susceptibility to azole drugs, imposing the need for accurate identification, and drug susceptibility testing in most clinical cases. Currently, our knowledge of azole resistance mechanisms in non-fumigatus Aspergillus species such as A. flavus, A. niger, A. tubingensis, A. terreus, A. fischeri, A. lentulus, A. udagawae, and A. calidoustus is limited. In this review, we present recent advances in our understanding of azole resistance mechanisms particularly in A. fumigatus. We then provide an overview of the genome sequences of non-fumigatus species, focusing on the proteins related to azole resistance mechanisms. PMID:27708619

  17. Design and implementation of the cacao genome database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Cacao Genome Database (CGD, www.cacaogenomedb.org) is being developed to provide a comprehensive data mining resource of genomic, genetic and breeding data for Theobroma cacao. Designed using Chado and a collection of Drupal modules, known as Tripal, CGD currently contains the genetically anchor...

  18. : a database of ciliate genome rearrangements.

    PubMed

    Burns, Jonathan; Kukushkin, Denys; Lindblad, Kelsi; Chen, Xiao; Jonoska, Nataša; Landweber, Laura F

    2016-01-01

    Ciliated protists exhibit nuclear dimorphism through the presence of somatic macronuclei (MAC) and germline micronuclei (MIC). In some ciliates, DNA from precursor segments in the MIC genome rearranges to form transcriptionally active genes in the mature MAC genome, making these ciliates model organisms to study the process of somatic genome rearrangement. Similar broad scale, somatic rearrangement events occur in many eukaryotic cells and tumors. The (http://oxytricha.princeton.edu/mds_ies_db) is a database of genome recombination and rearrangement annotations, and it provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila.

  19. : a database of ciliate genome rearrangements

    PubMed Central

    Burns, Jonathan; Kukushkin, Denys; Lindblad, Kelsi; Chen, Xiao; Jonoska, Nataša; Landweber, Laura F.

    2016-01-01

    Ciliated protists exhibit nuclear dimorphism through the presence of somatic macronuclei (MAC) and germline micronuclei (MIC). In some ciliates, DNA from precursor segments in the MIC genome rearranges to form transcriptionally active genes in the mature MAC genome, making these ciliates model organisms to study the process of somatic genome rearrangement. Similar broad scale, somatic rearrangement events occur in many eukaryotic cells and tumors. The (http://oxytricha.princeton.edu/mds_ies_db) is a database of genome recombination and rearrangement annotations, and it provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila. PMID:26586804

  20. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  1. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  2. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.

  3. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    PubMed Central

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  4. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  5. Corruption of genomic databases with anomalous sequence.

    PubMed

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-06-11

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.

  6. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

    PubMed Central

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T.; Karra, Kalpana; Hitz, Benjamin C.; Nash, Robert S.; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J.

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org PMID:27252399

  7. Prokaryotic Genomes from Microbes Online Database

    DOE Data Explorer

    Alm, Eric J.; Huang, Katherine H.; Price, Morgan N.; Koche, Richard P.; Keller, Keith; Dubchak, Inna L.; Arkin, Adam P.

    To describe the potential functions of genes, MicrobesOnline includes protein family analyses (from InterPro and COG), metabolic maps (from KEGG), links to research papers (from UniProt and PubMed), and operon predictions for every genome. To examine each gene's evolutionary history, MicrobesOnline includes precomputed phylogenetic trees for all the gene families. It displays gene trees with genomic context or it compares the gene tree to the species tree. The tools provided with MicrobesOnline allow users to: compute customized motifs, sequence alignments, and phylogenetic trees change expression patterns in metabolic maps annotate genes in various ways. A browse tree tool and a genome browser are available, along with specialized search capabilities. (Specialized Interface)

  8. OryzaGenome: Genome Diversity Database of Wild Oryza Species.

    PubMed

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori

    2016-01-01

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/. PMID:26578696

  9. StellaBase: the Nematostella vectensis Genomics Database.

    PubMed

    Sullivan, James C; Ryan, Joseph F; Watson, James A; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2006-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at http://www.stellabase.org.

  10. Complete Genome Sequence of the Filamentous Fungus Aspergillus westerdijkiae Reveals the Putative Biosynthetic Gene Cluster of Ochratoxin A

    PubMed Central

    Chakrabortti, Alolika; Li, Jinming

    2016-01-01

    Ochratoxin A (OTA) is a common mycotoxin that contaminates food and agricultural products. Sequencing of the complete genome of Aspergillus westerdijkiae, a major producer of OTA, reveals more than 50 biosynthetic gene clusters, including a putative OTA biosynthetic gene cluster that encodes a dozen of enzymes, transporters, and regulatory proteins. PMID:27635003

  11. Complete Genome Sequence of the Filamentous Fungus Aspergillus westerdijkiae Reveals the Putative Biosynthetic Gene Cluster of Ochratoxin A.

    PubMed

    Chakrabortti, Alolika; Li, Jinming; Liang, Zhao-Xun

    2016-01-01

    Ochratoxin A (OTA) is a common mycotoxin that contaminates food and agricultural products. Sequencing of the complete genome of Aspergillus westerdijkiae, a major producer of OTA, reveals more than 50 biosynthetic gene clusters, including a putative OTA biosynthetic gene cluster that encodes a dozen of enzymes, transporters, and regulatory proteins. PMID:27635003

  12. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated from Peanut Seeds in Georgia.

    PubMed

    Faustinelli, Paola C; Wang, Xinye Monica; Palencia, Edwin R; Arias, Renée S

    2016-04-14

    Aspergillus flavusandA. parasiticusfungi produce carcinogenic mycotoxins in peanut seeds, causing considerable impact on both human health and the economy. Here, we report nine genome sequences ofAspergillusspp., isolated from Georgia peanut seeds in 2014. The information obtained will lead to further biodiversity studies that are essential for developing control strategies.

  13. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  14. GenColors-based comparative genome databases for small eukaryotic genomes

    PubMed Central

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources. PMID:23193285

  15. Exploring phenotypic data at the rat genome database.

    PubMed

    Twigger, Simon N; S Smith, Jennifer; Zuniga-Meyer, Angela; Bromberg, Susan K

    2006-07-01

    The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have direct relevance to human-based research. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model-organism database that provides access to wide variety of curated rat data such as genes and their homologs, quantitative trait loci, phenotypes, comparative mapping, and genome analysis. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. We show how to make associations with the genome and use comparative tools to link the rat with human and mouse in order to integrate results from these three species of critical biomedical importance.

  16. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    SciTech Connect

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involved in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.

  17. Aspergillus flavus genomics: gateway to human and animal health, food safety, and crop resistance to diseases.

    PubMed

    Yu, Jiujiang; Cleveland, Thomas E; Nierman, William C; Bennett, Joan W

    2005-12-01

    Aspergillus flavus is an imperfect filamentous fungus that is an opportunistic pathogen causing invasive and non-invasive aspergillosis in humans, animals, and insects. It also causes allergic reactions in humans. A. flavus infects agricultural crops and stored grains and produces the most toxic and potent carcinogic metabolites such as aflatoxins and other mycotoxins. Breakthroughs in A. flavus genomics may lead to improvement in human health, food safety, and agricultural economy. The availability of A. flavus genomic data marks a new era in research for fungal biology, medical mycology, agricultural ecology, pathogenicity, mycotoxin biosynthesis, and evolution. The availability of whole genome microarrays has equipped scientists with a new powerful tool for studying gene expression under specific conditions. They can be used to identify genes responsible for mycotoxin biosynthesis and for fungal infection in humans, animals and plants. A. flavus genomics is expected to advance the development of therapeutic drugs and to provide information for devising strategies in controlling diseases of humans and other animals. Further, it will provide vital clues for engineering commercial crops resistant to fungal infection by incorporating antifungal genes that may prevent aflatoxin contamination of agricultural harvest. PMID:16499411

  18. Megx.net: integrated database resource for marine ecological genomics.

    PubMed

    Kottmann, Renzo; Kostadinov, Ivalyo; Duhaime, Melissa Beth; Buttigieg, Pier Luigi; Yilmaz, Pelin; Hankeln, Wolfgang; Waldmann, Jost; Glöckner, Frank Oliver

    2010-01-01

    Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net.

  19. Querying genomic databases: refining the connectivity map.

    PubMed

    Segal, Mark R; Xiong, Hao; Bengtsson, Henrik; Bourgon, Richard; Gentleman, Robert

    2012-01-01

    The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims surrounding cmap intimated that such tools might lead to new, previously unanticipated applications of existing drugs. However, further application suggests that its primary utility is in connecting a disease condition whose biology is largely unknown to a drug whose mechanisms of action are well understood, making cmap a tool for enhancing biological knowledge.The success of the Connectivity Map is belied by its simplicity. The aforementioned signature serves as an unordered query which is applied to a customized database of (differential) gene-expression experiments designed to elicit response to a wide range of drugs, across of spectrum of concentrations, durations, and cell lines. Such application is effected by computing a per experiment score that measures "closeness" between the signature and the experiment. Top-scoring experiments, and the attendant drug(s), are then deemed relevant to the disease underlying the query. Inference supporting such elicitations is pursued via re-sampling. In this paper, we revisit two key aspects of the Connectivity Map implementation. Firstly, we develop new approaches to measuring closeness for the common scenario wherein the query

  20. Assessment of the pectin degrading enzyme network of Aspergillus niger by functional genomics.

    PubMed

    Martens-Uzunova, Elena S; Schaap, Peter J

    2009-03-01

    The saprobic fungus Aspergillus niger is an efficient producer of a suite of extracellular enzymes involved in carbohydrate modification and degradation. Genome mining has resulted in the prediction of at least 39 genes encoding enzymes involved in the depolymerisation of the backbone of pectin. Additional genes,encoding enzymatic activities required for the degradation of the arabinan and arabinogalactan sidechains were predicted as well. DNA microarray analysis was used to study the condition-dependent expression of these genes, and to generate insights in possible synergistic interactions between the individual members of the pectin degrading enzyme network. For this purpose, A. niger was grown on sugarbeet pectin and on galacturonic acid, rhamnose and xylose, the main monomeric sugar constituents of pectin. An analysis of the corresponding transcriptomes revealed expression of 46 genes encoding pectinolytic enzymes. Their transcriptional profiles are discussed in detail and a cascade model of pectin degradation is proposed.

  1. CarrotDB: a genomic and transcriptomic database for carrot.

    PubMed

    Xu, Zhi-Sheng; Tan, Hua-Wei; Wang, Feng; Hou, Xi-Lin; Xiong, Ai-Sheng

    2014-01-01

    Carrot (Daucus carota L.) is an economically important vegetable worldwide and is the largest source of carotenoids and provitamin A in the human diet. Given the importance of this vegetable to humans, research and breeding communities on carrot should obtain useful genomic and transcriptomic information. The first whole-genome sequences of 'DC-27' carrot were de novo assembled and analyzed. Transcriptomic sequences of 14 carrot genotypes were downloaded from the Sequence Read Archive (SRA) database of National Center for Biotechnology Information (NCBI) and mapped to the whole-genome sequence before assembly. Based on these data sets, the first Web-based genomic and transcriptomic database for D. carota (CarrotDB) was developed (database homepage: http://apiaceae.njau.edu.cn/car rotdb). CarrotDB offers the tools of Genome Map and Basic Local Alignment Search Tool. Using these tools, users can search certain target genes and simple sequence repeats along with designed primers of 'DC-27'. Assembled transcriptomic sequences along with fragments per kilobase of transcript sequence per millions base pairs sequenced information (FPKM) information of 14 carrot genotypes are also provided. Users can download de novo assembled whole-genome sequences, putative gene sequences and putative protein sequences of 'DC-27'. Users can also download transcriptome sequence assemblies of 14 carrot genotypes along with their FPKM information. A total of 2826 transcription factor (TF) genes classified into 57 families were identified in the entire genome sequences. These TF genes were embedded in CarrotDB as an interface. The 'GERMPLASM' part of CarrotDB also offers taproot photos of 45 carrot genotypes and a table containing accession numbers, names, countries of origin and colors of cortex, phloem and xylem parts of taproots corresponding to each carrot genotype. CarrotDB will be continuously updated with new information. Database URL: http://apiaceae.njau.edu.cn/carrotdb/ PMID

  2. CarrotDB: a genomic and transcriptomic database for carrot

    PubMed Central

    Xu, Zhi-Sheng; Tan, Hua-Wei; Wang, Feng; Hou, Xi-Lin; Xiong, Ai-Sheng

    2014-01-01

    Carrot (Daucus carota L.) is an economically important vegetable worldwide and is the largest source of carotenoids and provitamin A in the human diet. Given the importance of this vegetable to humans, research and breeding communities on carrot should obtain useful genomic and transcriptomic information. The first whole-genome sequences of ‘DC-27’ carrot were de novo assembled and analyzed. Transcriptomic sequences of 14 carrot genotypes were downloaded from the Sequence Read Archive (SRA) database of National Center for Biotechnology Information (NCBI) and mapped to the whole-genome sequence before assembly. Based on these data sets, the first Web-based genomic and transcriptomic database for D. carota (CarrotDB) was developed (database homepage: http://apiaceae.njau.edu.cn/car rotdb). CarrotDB offers the tools of Genome Map and Basic Local Alignment Search Tool. Using these tools, users can search certain target genes and simple sequence repeats along with designed primers of ‘DC-27’. Assembled transcriptomic sequences along with fragments per kilobase of transcript sequence per millions base pairs sequenced information (FPKM) information of 14 carrot genotypes are also provided. Users can download de novo assembled whole-genome sequences, putative gene sequences and putative protein sequences of ‘DC-27’. Users can also download transcriptome sequence assemblies of 14 carrot genotypes along with their FPKM information. A total of 2826 transcription factor (TF) genes classified into 57 families were identified in the entire genome sequences. These TF genes were embedded in CarrotDB as an interface. The ‘GERMPLASM’ part of CarrotDB also offers taproot photos of 45 carrot genotypes and a table containing accession numbers, names, countries of origin and colors of cortex, phloem and xylem parts of taproots corresponding to each carrot genotype. CarrotDB will be continuously updated with new information. Database URL: http

  3. Genomic Databases and Biobanks in Denmark.

    PubMed

    Hartlev, Mette

    2015-01-01

    Biobanking in Denmark is regulated via patients' rights laws, data protection laws, and research ethics reviews. Danish law recognizes tissue samples as personal data for purposes of the data protection laws, meaning research with tissue samples may be subject to research ethics review, data protection laws, and patients' rights requirements depending on the circumstances of collection. However, research on information gained through whole genome sequencing is subject only to data protection laws, despite the similarity in the nature of the information. The regulatory framework treats biobank samples collected from patients differently than samples collected from research participants, particularly with respect to autonomy. Importantly, biobanks established for future unspecified research are not subject to research ethics review. Biobank-based research has gained more prominence on the national level recently, and the potential for a less fragmented and more consistent regulatory approach may emerge from this attention. PMID:26711414

  4. Choosing a Genome Browser for a Model Organism Database (MOD): Surveying the Maize Community

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As the maize genome sequencing is nearing its completion, the Maize Genetics and Genomics Database (MaizeGDB), the Model Organism Database for maize, integrated a genome browser to its already existing Web interface and database. The addition of the MaizeGDB Genome Browser to MaizeGDB will allow it ...

  5. A primer on rapid prototyping of genomic databases in Prolog

    SciTech Connect

    Yoshida, Kaoru; Smith, C.L. ); Overbeek, R. . Mathematics and Computer Science Div.)

    1992-01-01

    This report presents a tutorial on how one might create an integrated database of genomic information. We outline the required steps for implementation, give a brief introduction to Prolog, and discuss the query facility supported by our system. Our goal is to enable researchers to being constructing their own biological information system.

  6. MaizeGDB: The Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genetic, genomic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website...

  7. MaizeGDB: The Maize Genetics and Genomics Database.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project’s website...

  8. Genome shuffling of Aspergillus glaucus HGZ-2 for enhanced cellulase production.

    PubMed

    Zhao, Yuping; Jiang, Changxing; Yu, Hupeng; Fang, Fang; Yang, Jingzhu

    2014-10-01

    The production of cellulase from Aspergillus glaucus HGZ-2 was improved by using genome shuffling. The starting populations, obtained by UV irradiation, were subjected to recursive protoplast fusion. The optimal conditions for protoplast formation and regeneration were 7 mg/ml snailase and 5 mg/ml cellulase at 34 °C for 3.0 h using 0.7 M NaCl as an osmotic stabilizer. The protoplasts were inactivated under UV for 30 min or heated at 50 °C for 50 min, and a fusant probability of about 100 % was observed. The positive colonies were created by fusing the inactivated protoplasts. The optimal conditions for protoplast fusion were PEG6000 concentration of 35 %, CaCl2 concentration of 0.02 M, and incubation time of 12 min. After two rounds of genome shuffling, one strain (Y) was obtained. Its filter paper cellulase (FPase) and carboxymethyl cellulase (CMCase) activity reached 71 and 70 U/ml, respectively, which were increased by 1.95-fold and 1.72-fold in comparison with that of its ancestor strain. The results indicated that genome shuffling was an efficient means for the improved production of cellulases by A. glaucus HGZ-2.

  9. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  10. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  11. SmedGD: the Schmidtea mediterranea genome database

    PubMed Central

    Robb, Sofia M.C.; Ross, Eric; Alvarado, Alejandro Sánchez

    2008-01-01

    The planarian Schmidtea mediterranea is rapidly emerging as a model organism for the study of regeneration, tissue homeostasis and stem cell biology. The recent sequencing, assembly and annotation of its genome are expected to further buoy the biomedical importance of this organism. In order to make the extensive data associated with the genome sequence accessible to the biomedical and planarian communities, we have created the Schmidtea mediterranea Genome Database (SmedGD). SmedGD integrates in a single web-accessible portal all available data associated with the planarian genome, including predicted and annotated genes, ESTs, protein homologies, gene expression patterns and RNAi phenotypes. Moreover, SmedGD was designed using tools provided by the Generic Model Organism Database (GMOD) project, thus making its data structure compatible with other model organism databases. Because of the unique phylogenetic position of planarians, SmedGD (http://smedgd.neuro.utah.edu) will prove useful not only to the planarian research community, but also to those engaged in developmental and evolutionary biology, comparative genomics, stem cell research and regeneration. PMID:17881371

  12. Mouse Genome Database: From sequence to phenotypes and disease models.

    PubMed

    Eppig, Janan T; Richardson, Joel E; Kadin, James A; Smith, Cynthia L; Blake, Judith A; Bult, Carol J

    2015-08-01

    The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. PMID:26150326

  13. EU Laws on Privacy in Genomic Databases and Biobanking.

    PubMed

    Townend, David

    2016-03-01

    Both the European Union and the Council of Europe have a bearing on privacy in genomic databases and biobanking. In terms of legislation, the processing of personal data as it relates to the right to privacy is currently largely regulated in Europe by Directive 95/46/EC, which requires that processing be "fair and lawful" and follow a set of principles, meaning that the data be processed only for stated purposes, be sufficient for the purposes of the processing, be kept only for so long as is necessary to achieve those purposes, and be kept securely and only in an identifiable state for such time as is necessary for the processing. The European privacy regime does not require the de-identification (anonymization) of personal data used in genomic databases or biobanks, and alongside this practice informed consent as well as governance and oversight mechanisms provide for the protection of genomic data. PMID:27256129

  14. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  15. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database

    PubMed Central

    Winsor, Geoffrey L.; Griffiths, Emma J.; Lo, Raymond; Dhillon, Bhavjinder K.; Shay, Julie A.; Brinkman, Fiona S. L.

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  16. Development of genome viewer (Web Omics Viewer) for managing databases of cucumber genome

    NASA Astrophysics Data System (ADS)

    Wojcieszek, M.; RóŻ, P.; Pawełkowicz, M.; Nowak, R.; Przybecki, Z.

    Cucumber is an important plant in horticulture and science world. Sequencing projects of C. sativus genome enable new methodological aproaches in further investigation of this species. Accessibility is crucial to fully exploit obtained information about detail structure of genes, markers and other characteristic features such contigs, scaffolds and chromosomes. Genome viewer is one of tools providing plain and easy way for presenting genome data for users and for databases administration. Gbrowse - the main viewer has several very useful features but lacks in managing simplicity. Our group developed new genome browser Web Omics Viewer (WOV), keeping functionality but improving utilization and accessibility to cucumber genome data.

  17. Integrated database of information from structural genomics experiments.

    PubMed

    Asada, Yukuhiko; Sugahara, Michihiro; Mizutani, Hisashi; Naitow, Hisashi; Tanaka, Tomoyuki; Matsuura, Yoshinori; Agari, Yoshihiro; Ebihara, Akio; Shinkai, Akeo; Kuramitsu, Seiki; Yokoyama, Shigeyuki; Kaminuma, Eri; Kobayashi, Norio; Nishikata, Koro; Shimoyama, Sayoko; Toyoda, Tetsuro; Ishikawa, Tetsuya; Kunishima, Naoki

    2013-05-01

    Information from structural genomics experiments at the RIKEN SPring-8 Center, Japan has been compiled and published as an integrated database. The contents of the database are (i) experimental data from nine species of bacteria that cover a large variety of protein molecules in terms of both evolution and properties (http://database.riken.jp/db/bacpedia), (ii) experimental data from mutant proteins that were designed systematically to study the influence of mutations on the diffraction quality of protein crystals (http://database.riken.jp/db/bacpedia) and (iii) experimental data from heavy-atom-labelled proteins from the heavy-atom database HATODAS (http://database.riken.jp/db/hatodas). The database integration adopts the semantic web, which is suitable for data reuse and automatic processing, thereby allowing batch downloads of full data and data reconstruction to produce new databases. In addition, to enhance the use of data (i) and (ii) by general researchers in biosciences, a comprehensible user interface, Bacpedia (http://bacpedia.harima.riken.jp), has been developed.

  18. DemaDb: an integrated dematiaceous fungal genomes database

    PubMed Central

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my PMID:26980516

  19. ONTOFUSION: ontology-based integration of genomic and clinical databases.

    PubMed

    Pérez-Rey, D; Maojo, V; García-Remesal, M; Alonso-Calvo, R; Billhardt, H; Martin-Sánchez, F; Sousa, A

    2006-01-01

    ONTOFUSION is an ontology-based system designed for biomedical database integration. It is based on two processes: mapping and unification. Mapping is a semi-automated process that uses ontologies to link a database schema with a conceptual framework-named virtual schema. There are three methodologies for creating virtual schemas, according to the origin of the domain ontology used: (1) top-down--e.g. using an existing ontology, such as the UMLS or Gene Ontology--, (2) bottom-up--building a new domain ontology-- and (3) a hybrid combination. Unification is an automated process for integrating ontologies and hence the database to which they are linked. Using these methods, we employed ONTOFUSION to integrate a large number of public genomic and clinical databases, as well as biomedical ontologies.

  20. The Medicago Genome Initiative: a model legume database

    PubMed Central

    Bell, Callum J.; Dixon, Richard A.; Farmer, Andrew D.; Flores, Raul; Inman, Jeff; Gonzales, Robert A.; Harrison, Maria J.; Paiva, Nancy L.; Scott, Angela D.; Weller, Jennifer W.; May, Gregory D.

    2001-01-01

    The Medicago Genome Initiative (MGI) is a database of EST sequences of the model legume Medicago truncatula. The database is available to the public and has resulted from a collaborative research effort between the Samuel Roberts Noble Foundation and the National Center for Genome Resources to investigate the genome of M.truncatula. MGI is part of the greater integrated Medicago functional genomics program at the Noble Foundation (http://www.noble .org), which is taking a global approach in studying the genetic and biochemical events associated with the growth, development and environmental interactions of this model legume. Our approach will include: large-scale EST sequencing, gene expression profiling, the generation of M.truncatula activation-tagged and promoter trap insertion mutants, high-throughput metabolic profiling, and proteome studies. These multidisciplinary information pools will be interfaced with one another to provide scientists with an integrated, holistic set of tools to address fundamental questions pertaining to legume biology. The public interface to the MGI database can be accessed at http://www.ncgr.org/research/mgi. PMID:11125064

  1. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  2. Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus.

    PubMed

    Nierman, William C; Pain, Arnab; Anderson, Michael J; Wortman, Jennifer R; Kim, H Stanley; Arroyo, Javier; Berriman, Matthew; Abe, Keietsu; Archer, David B; Bermejo, Clara; Bennett, Joan; Bowyer, Paul; Chen, Dan; Collins, Matthew; Coulsen, Richard; Davies, Robert; Dyer, Paul S; Farman, Mark; Fedorova, Nadia; Fedorova, Natalie; Feldblyum, Tamara V; Fischer, Reinhard; Fosker, Nigel; Fraser, Audrey; García, Jose L; García, Maria J; Goble, Arlette; Goldman, Gustavo H; Gomi, Katsuya; Griffith-Jones, Sam; Gwilliam, Ryan; Haas, Brian; Haas, Hubertus; Harris, David; Horiuchi, H; Huang, Jiaqi; Humphray, Sean; Jiménez, Javier; Keller, Nancy; Khouri, Hoda; Kitamoto, Katsuhiko; Kobayashi, Tetsuo; Konzack, Sven; Kulkarni, Resham; Kumagai, Toshitaka; Lafon, Anne; Lafton, Anne; Latgé, Jean-Paul; Li, Weixi; Lord, Angela; Lu, Charles; Majoros, William H; May, Gregory S; Miller, Bruce L; Mohamoud, Yasmin; Molina, Maria; Monod, Michel; Mouyna, Isabelle; Mulligan, Stephanie; Murphy, Lee; O'Neil, Susan; Paulsen, Ian; Peñalva, Miguel A; Pertea, Mihaela; Price, Claire; Pritchard, Bethan L; Quail, Michael A; Rabbinowitsch, Ester; Rawlins, Neil; Rajandream, Marie-Adele; Reichard, Utz; Renauld, Hubert; Robson, Geoffrey D; Rodriguez de Córdoba, Santiago; Rodríguez-Peña, Jose M; Ronning, Catherine M; Rutter, Simon; Salzberg, Steven L; Sanchez, Miguel; Sánchez-Ferrero, Juan C; Saunders, David; Seeger, Kathy; Squares, Rob; Squares, Steven; Takeuchi, Michio; Tekaia, Fredj; Turner, Geoffrey; Vazquez de Aldana, Carlos R; Weidman, Janice; White, Owen; Woodward, John; Yu, Jae-Hyuk; Fraser, Claire; Galagan, James E; Asai, Kiyoshi; Machida, Masayuki; Hall, Neil; Barrell, Bart; Denning, David W

    2005-12-22

    Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus. PMID:16372009

  3. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.

  4. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  5. Genome Information Broker for Viruses (GIB-V): database for comparative analysis of virus genomes

    PubMed Central

    Hirahata, Masaki; Abe, Takashi; Tanaka, Naoto; Kuwana, Yoshikazu; Shigemoto, Yasumasa; Miyazaki, Satoru; Suzuki, Yoshiyuki; Sugawara, Hideaki

    2007-01-01

    Genome Information Broker for Viruses (GIB-V) is a comprehensive virus genome/segment database. We extracted 18 418 complete virus genomes/segments from the International Nucleotide Sequence Database Collaboration (INSDC, ) by DNA Data Bank of Japan (DDBJ), EMBL and GenBank and stored them in our system. The list of registered viruses is arranged hierarchically according to taxonomy. Keyword searches can be performed for genome/segment data or biological features of any virus stored in GIB-V. GIB-V is equipped with a BLAST search function, and search results are displayed graphically or in list form. Moreover, the BLAST results can be used online with the ClustalW feature of the DDBJ. All available virus genome/segment data can be collected by the GIB-V download function. GIB-V can be accessed at no charge at . PMID:17158166

  6. A Database of Gene Expression Profiles of Korean Cancer Genome.

    PubMed

    Kim, Seon-Kyu; Chu, In-Sun

    2015-09-01

    Because there are clear molecular differences entailing different treatment effectiveness between Korean and non-Korean cancer patients, identifying distinct molecular characteristics of Korean cancers is profoundly important. Here, we report a web-based data repository, namely Korean Cancer Genome Database (KCGD), for searching gene signatures associated with Korean cancer patients. Currently, a total of 1,403 cancer genomics data were collected, processed and stored in our repository, an ever-growing database. We incorporated most widely used statistical survival analysis methods including the Cox proportional hazard model, log-rank test and Kaplan-Meier plot to provide instant significance estimation for searched molecules. As an initial repository with the aim of Korean-specific marker detection, KCGD would be a promising web application for users without bioinformatics expertise to identify significant factors associated with cancer in Korean. PMID:26523133

  7. CnidBase: The Cnidarian Evolutionary Genomics Database

    PubMed Central

    Ryan, Joseph F.; Finnerty, John R.

    2003-01-01

    CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians. In turn, CnidBase will help to illuminate the role of specific genes in shaping cnidarian biodiversity in the present day and in the distant past. CnidBase highlights evolutionary changes between species within the phylum Cnidaria and structures genomic and expression data to facilitate comparisons to non-cnidarian metazoans. CnidBase aims to further the progress that has already been made in the realm of cnidarian evolutionary genomics by creating a central community resource which will help drive future research and facilitate more accurate classification and comparison of new experimental data with existing data. CnidBase is available at http://cnidbase.bu.edu/. PMID:12519972

  8. A web-based genomic sequence database for the Streptomycetaceae: a tool for systematics and genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...

  9. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. PMID:25480115

  10. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/.

  11. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  12. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  13. The evolutionary imprint of domestication on genome variation and function of the filamentous fungus Aspergillus oryzae.

    PubMed

    Gibbons, John G; Salichos, Leonidas; Slot, Jason C; Rinker, David C; McGary, Kriston L; King, Jonas G; Klich, Maren A; Tabb, David L; McDonald, W Hayes; Rokas, Antonis

    2012-08-01

    The domestication of animals, plants, and microbes fundamentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2-6]. Here, we systematically examined genome-wide sequence and functional variation between the domesticated fungus Aspergillus oryzae, whose saccharification abilities humans have harnessed for thousands of years to produce sake, soy sauce, and miso from starch-rich grains, and its wild relative A. flavus, a potentially toxigenic plant and animal pathogen [7]. We discovered dramatic changes in the sequence variation and abundance profiles of genes and wholesale primary and secondary metabolic pathways between domesticated and wild relative isolates during growth on rice. Our data suggest that, through selection by humans, an atoxigenic lineage of A. flavus gradually evolved into a "cell factory" for enzymes and metabolites involved in the saccharification process. These results suggest that whereas animal and plant domestication was largely driven by Neolithic "genetic tinkering" of developmental pathways, microbe domestication was driven by extensive remodeling of metabolism.

  14. Metabolic model integration of the bibliome, genome, metabolome and reactome of Aspergillus niger

    PubMed Central

    Andersen, Mikael Rørdam; Nielsen, Michael Lynge; Nielsen, Jens

    2008-01-01

    The release of the genome sequences of two strains of Aspergillus niger has allowed systems-level investigations of this important microbial cell factory. To this end, tools for doing data integration of multi-ome data are necessary, and especially interesting in the context of metabolism. On the basis of an A. niger bibliome survey, we present the largest model reconstruction of a metabolic network reported for a fungal species. The reconstructed gapless metabolic network is based on the reportings of 371 articles and comprises 1190 biochemically unique reactions and 871 ORFs. Inclusion of isoenzymes increases the total number of reactions to 2240. A graphical map of the metabolic network is presented. All levels of the reconstruction process were based on manual curation. From the reconstructed metabolic network, a mathematical model was constructed and validated with data on yields, fluxes and transcription. The presented metabolic network and map are useful tools for examining systemwide data in a metabolic context. Results from the validated model show a great potential for expanding the use of A. niger as a high-yield production platform. PMID:18364712

  15. Functional Genomic Analysis of Aspergillus flavus Interacting with Resistant and Susceptible Peanut

    PubMed Central

    Wang, Houmiao; Lei, Yong; Yan, Liying; Wan, Liyun; Ren, Xiaoping; Chen, Silong; Dai, Xiaofeng; Guo, Wei; Jiang, Huifang; Liao, Boshou

    2016-01-01

    In the Aspergillus flavus (A. flavus)–peanut pathosystem, development and metabolism of the fungus directly influence aflatoxin contamination. To comprehensively understand the molecular mechanism of A. flavus interaction with peanut, RNA-seq was used for global transcriptome profiling of A. flavus during interaction with resistant and susceptible peanut genotypes. In total, 67.46 Gb of high-quality bases were generated for A. flavus-resistant (af_R) and -susceptible peanut (af_S) at one (T1), three (T2) and seven (T3) days post-inoculation. The uniquely mapped reads to A. flavus reference genome in the libraries of af_R and af_S at T2 and T3 were subjected to further analysis, with more than 72% of all obtained genes expressed in the eight libraries. Comparison of expression levels both af_R vs. af_S and T2 vs. T3 uncovered 1926 differentially expressed genes (DEGs). DEGs associated with mycelial growth, conidial development and aflatoxin biosynthesis were up-regulated in af_S compared with af_R, implying that A. flavus mycelia more easily penetrate and produce much more aflatoxin in susceptible than in resistant peanut. Our results serve as a foundation for understanding the molecular mechanisms of aflatoxin production differences between A. flavus-R and -S peanut, and offer new clues to manage aflatoxin contamination in crops. PMID:26891328

  16. Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome.

    PubMed

    Hirschman, Jodi E; Balakrishnan, Rama; Christie, Karen R; Costanzo, Maria C; Dwight, Selina S; Engel, Stacia R; Fisk, Dianna G; Hong, Eurie L; Livstone, Michael S; Nash, Robert; Park, Julie; Oughtred, Rose; Skrzypek, Marek; Starr, Barry; Theesfeld, Chandra L; Williams, Jennifer; Andrada, Rey; Binkley, Gail; Dong, Qing; Lane, Christopher; Miyasato, Stuart; Sethuraman, Anand; Schroeder, Mark; Thanawala, Mayank K; Weng, Shuai; Dolinski, Kara; Botstein, David; Cherry, J Michael

    2006-01-01

    Sequencing and annotation of the entire Saccharomyces cerevisiae genome has made it possible to gain a genome-wide perspective on yeast genes and gene products. To make this information available on an ongoing basis, the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/) has created the Genome Snapshot (http://db.yeastgenome.org/cgi-bin/genomeSnapShot.pl). The Genome Snapshot summarizes the current state of knowledge about the genes and chromosomal features of S.cerevisiae. The information is organized into two categories: (i) number of each type of chromosomal feature annotated in the genome and (ii) number and distribution of genes annotated to Gene Ontology terms. Detailed lists are accessible through SGD's Advanced Search tool (http://db.yeastgenome.org/cgi-bin/search/featureSearch), and all the data presented on this page are available from the SGD ftp site (ftp://ftp.yeastgenome.org/yeast/).

  17. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome.

    PubMed

    Elsik, Christine G; Unni, Deepak R; Diesh, Colin M; Tayal, Aditi; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies.

  18. Exploring genetic, genomic, and phenotypic data at the rat genome database.

    PubMed

    Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Lowry, Timothy F; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R; Dwinell, Melinda R; Jacob, Howard J; Shimoyama, Mary

    2012-12-01

    The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes, and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat.

  19. DoGSD: the dog and wolf genome SNP database

    PubMed Central

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M.; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. PMID:25404132

  20. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. PMID:25404132

  1. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.

  2. The integrated web service and genome database for agricultural plants with biotechnology information

    PubMed Central

    Kim, ChangKug; Park, DongSuk; Seol, YoungJoo; Hahn, JangHo

    2011-01-01

    The National Agricultural Biotechnology Information Center (NABIC) constructed an agricultural biology-based infrastructure and developed a Web based relational database for agricultural plants with biotechnology information. The NABIC has concentrated on functional genomics of major agricultural plants, building an integrated biotechnology database for agro-biotech information that focuses on genomics of major agricultural resources. This genome database provides annotated genome information from 1,039,823 records mapped to rice, Arabidopsis, and Chinese cabbage. PMID:21887015

  3. The YeastGenome app: the Saccharomyces Genome Database at your fingertips.

    PubMed

    Wong, Edith D; Karra, Kalpana; Hitz, Benjamin C; Hong, Eurie L; Cherry, J Michael

    2013-01-01

    The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD's mission to provide free and open access to all its data and annotations. PMID:23396302

  4. Tetrahymena functional genomics database (TetraFGD): an integrated resource for Tetrahymena functional genomics.

    PubMed

    Xiong, Jie; Lu, Yuming; Feng, Jinmei; Yuan, Dongxia; Tian, Miao; Chang, Yue; Fu, Chengjie; Wang, Guangying; Zeng, Honghui; Miao, Wei

    2013-01-01

    The ciliated protozoan Tetrahymena thermophila is a useful unicellular model organism for studies of eukaryotic cellular and molecular biology. Researches on T. thermophila have contributed to a series of remarkable basic biological principles. After the macronuclear genome was sequenced, substantial progress has been made in functional genomics research on T. thermophila, including genome-wide microarray analysis of the T. thermophila life cycle, a T. thermophila gene network analysis based on the microarray data and transcriptome analysis by deep RNA sequencing. To meet the growing demands for the Tetrahymena research community, we integrated these data to provide a public access database: Tetrahymena functional genomics database (TetraFGD). TetraFGD contains three major resources, including the RNA-Seq transcriptome, microarray and gene networks. The RNA-Seq data define gene structures and transcriptome, with special emphasis on exon-intron boundaries; the microarray data describe gene expression of 20 time points during three major stages of the T. thermophila life cycle; the gene network data identify potential gene-gene interactions of 15 049 genes. The TetraFGD provides user-friendly search functions that assist researchers in accessing gene models, transcripts, gene expression data and gene-gene relationships. In conclusion, the TetraFGD is an important functional genomic resource for researchers who focus on the Tetrahymena or other ciliates. Database URL: http://tfgd.ihb.ac.cn/

  5. The Mouse Genome Database (MGD): mouse biology and model systems.

    PubMed

    Bult, Carol J; Eppig, Janan T; Kadin, James A; Richardson, Joel E; Blake, Judith A

    2008-01-01

    The Mouse Genome Database, (MGD, http://www.informatics.jax.org/), integrates genetic, genomic and phenotypic information about the laboratory mouse, a primary animal model for studying human biology and disease. MGD data content includes comprehensive characterization of genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information including comparative data on mammalian genes. Data within MGD are obtained from diverse sources including manual curation of the biomedical literature, direct contributions from individual investigator's laboratories and major informatics resource centers such as Ensembl, UniProt and NCBI. MGD collaborates with the bioinformatics community on the development of data and semantic standards such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. MGD provides a data-mining platform that enables the development of translational research hypotheses based on comparative genotype, phenotype and functional analyses. Both web-based querying and computational access to data are provided. Recent improvements in MGD described here include the association of gene trap data with mouse genes and a new batch query capability for customized data access and retrieval.

  6. A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses.

    PubMed

    Kryukov, Kirill; Sumiyama, Kenta; Ikeo, Kazuho; Gojobori, Takashi; Saitou, Naruya

    2012-01-01

    Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1-4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/.

  7. Biological Database of Images and Genomes: tools for community annotations linking image and genomic information

    PubMed Central

    Oberlin, Andrew T; Jurkovic, Dominika A; Balish, Mitchell F; Friedberg, Iddo

    2013-01-01

    Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype–genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management and web services. By using BioDIG to create websites, researchers and curators can rapidly annotate a large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of mycoplasmas. Database URL: BioDIG website: http://biodig.org BioDIG source code repository: http://github.com/FriedbergLab/BioDIG The MyDIG database: http://mydig.biodig.org/ PMID:23550062

  8. Ontology searching and browsing at the Rat Genome Database.

    PubMed

    Laulederkind, Stanley J F; Tutaj, Marek; Shimoyama, Mary; Hayman, G Thomas; Lowry, Timothy F; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R; Jacob, Howard J

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a 'driller' type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. DATABASE URL: http://rgd.mcw.edu/rgdweb/ontology/search.html.

  9. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology

    PubMed Central

    Hong, Dongwan; Park, Sung-Soo; Ju, Young Seok; Kim, Sheehyun; Shin, Jong-Yeon; Kim, Sujung; Yu, Saet-Byeol; Lee, Won-Chul; Lee, Seungbok; Park, Hansoo; Kim, Jong-Il; Seo, Jeong-Sun

    2011-01-01

    High-throughput genomic technologies have been used to explore personal human genomes for the past few years. Although the integration of technologies is important for high-accuracy detection of personal genomic variations, no databases have been prepared to systematically archive genomes and to facilitate the comparison of personal genomic data sets prepared using a variety of experimental platforms. We describe here the Total Integrated Archive of Short-Read and Array (TIARA; http://tiara.gmi.ac.kr) database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA supports a user-friendly genome browser, which retrieves read-depths (RDs) and log2 ratios from NGS and CGH arrays, respectively. In addition, this database provides information on all genomic variants and the raw data, including short reads and feature-level CGH data, through anonymous file transfer protocol. More personal genomes will be archived as more individuals are analyzed by NGS or CGH array. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research. PMID:21051338

  10. The Saccharomyces Genome Database: A Tool for Discovery.

    PubMed

    Cherry, J Michael

    2015-12-01

    The Saccharomyces Genome Database (SGD) is the main community repository of information for the budding yeast, Saccharomyces cerevisiae. The SGD has collected published results on chromosomal features, including genes and their products, and has become an encyclopedia of information on the biology of the yeast cell. This information includes gene and gene product function, phenotype, interactions, regulation, complexes, and pathways. All information has been integrated into a unique web resource, accessible via http://yeastgenome.org. The website also provides custom tools to allow useful searches and visualization of data. The experimentally defined functions of genes, mutant phenotypes, and sequence homologies archived in the SGD provide a platform for understanding many fields of biological research. The mission of SGD is to provide public access to all published experimental results on yeast to aid life science students, educators, and researchers. As such, the SGD has become an essential tool for the design of experiments and for the analysis of experimental results. PMID:26631132

  11. Exploring human disease using the Rat Genome Database

    PubMed Central

    Laulederkind, Stanley J. F.; De Pons, Jeff; Nigam, Rajni; Smith, Jennifer R.; Tutaj, Marek; Petri, Victoria; Hayman, G. Thomas; Wang, Shur-Jen; Ghiasvand, Omid; Thota, Jyothi; Dwinell, Melinda R.

    2016-01-01

    ABSTRACT Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers – within and beyond the rat community – who are particularly interested in leveraging rat-based insights to understand human diseases. PMID:27736745

  12. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  13. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der |

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  14. The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease.

    PubMed

    Shimoyama, Mary; De Pons, Jeff; Hayman, G Thomas; Laulederkind, Stanley J F; Liu, Weisong; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Wang, Shur-Jen; Worthey, Elizabeth; Dwinell, Melinda; Jacob, Howard

    2015-01-01

    The Rat Genome Database (RGD, http://rgd.mcw.edu) provides the most comprehensive data repository and informatics platform related to the laboratory rat, one of the most important model organisms for disease studies. RGD maintains and updates datasets for genomic elements such as genes, transcripts and increasingly in recent years, sequence variations, as well as map positions for multiple assemblies and sequence information. Functional annotations for genomic elements are curated from published literature, submitted by researchers and integrated from other public resources. Complementing the genomic data catalogs are those associated with phenotypes and disease, including strains, QTL and experimental phenotype measurements across hundreds of strains. Data are submitted by researchers, acquired through bulk data pipelines or curated from published literature. Innovative software tools provide users with an integrated platform to query, mine, display and analyze valuable genomic and phenomic datasets for discovery and enhancement of their own research. This update highlights recent developments that reflect an increasing focus on: (i) genomic variation, (ii) phenotypes and diseases, (iii) data related to the environment and experimental conditions and (iv) datasets and software tools that allow the user to explore and analyze the interactions among these and their impact on disease.

  15. Using the Saccharomyces Genome Database (SGD) for analysis of genomic information.

    PubMed

    Skrzypek, Marek S; Hirschman, Jodi

    2011-09-01

    Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large-scale datasets.

  16. Aspergillus flavus whole genome and EST sequence releases and construction of homologous gene search blast server

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic and carcinogenic secondary metabolites. These compounds, produced by Aspergillus flavus and A. parasiticus, contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed...

  17. Aspergillus flavus Genomic Data Mining Provides Clues for Its Use in Producing Biobased Products

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is notorious for its ability to produce aflatoxins. It is also an opportunistic pathogen that infects plants, animals and human beings. The ability to survive in the natural environment, living on plant tissues (leaves or stalks), live or dead insects make A. flavus a ubiquitous...

  18. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  19. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data

    PubMed Central

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. PMID:25398900

  20. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information.

  1. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    DOE Data Explorer

    Loots, Gabriela G. [LLNL; Ovcharenko, I. [LLNL

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. This database of evolutionary conserved regions (ECRs) in vertebrate genomes features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites (TFBSs) in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and fugu genomes. (taken from paper in Journal: Bioinformatics, November 7, 2006, pp. 122-124

  2. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    SciTech Connect

    Loots, G; Ovcharenko, I

    2006-08-08

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions (ECRs) in vertebrate genomes entitled ECRbase that is constructed from a collection of pairwise vertebrate genome alignments produced by the ECR Browser database. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a collection of promoters in all vertebrate genomes presented in the database. The database also contains a collection of annotated transcription factor binding sites (TFBS) in all ECRs and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and two pufferfish genomes. It is freely accessible at http://ECRbase.dcode.org.

  3. MELOGEN: an EST database for melon functional genomics

    PubMed Central

    Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

    2007-01-01

    Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes

  4. Draft genome sequences of two closely-related aflatoxigenic Aspergillus species obtained from the Ivory Coast

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...

  5. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    PubMed

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/.

  6. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources

    PubMed Central

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/ PMID:26589635

  7. Genomic analysis of the aconidial and high-performance protein producer, industrially relevant Aspergillus niger SH2 strain.

    PubMed

    Yin, Chao; Wang, Bin; He, Pan; Lin, Ying; Pan, Li

    2014-05-15

    Aspergillus niger is usually regarded as a beneficial species widely used in biotechnological industry. Obtaining the genome sequence of the widely used aconidial A. niger SH2 strain is of great importance to understand its unusual production capability. In this study we assembled a high-quality genome sequence of A. niger SH2 with approximately 11,517 ORFs. Relatively high proportion of genes enriched for protein expression related FunCat items verify its efficient capacity in protein production. Furthermore, genome-wide comparative analysis between A. niger SH2 and CBS513.88 reveals insights into unique properties of A. niger SH2. A. niger SH2 lacks the gene related with the initiation of asexual sporulation (PrpA), leading to its distinct aconidial phenotype. Frame shift mutations and non-synonymous SNPs in genes of cell wall integrity signaling, β-1,3-glucan synthesis and chitin synthesis influence its cell wall development which is important for its hyphal fragmentation during industrial high-efficiency protein production.

  8. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter; van de Vondervoort, Peter; Culley, David E.; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristian F.; Albang, Richard; Albermann, Kaj; Berka, Randy; Braus, Gerhard; Braus-Stromeyer, Susanna A.; Corrochano, Luis; Dai, Ziyu; van Dijck, Piet; Hofmann, Gerald; Lasure, Linda L.; Magnuson, Jon K.; Menke, Hildegard; Meijer, Martin; Meijer, Susan; Nielsen, Jakob B.; Nielsen, Michael L.; van Ooyen, Albert; Pel, Herman J.; Poulsen, Lars; Samson, Rob; Stam, Hein; Tsang, Adrian; van den Brink, Johannes M.; ATkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Grigoriev, Igor V.; Kubicek, Christian P.; Martinez, Diego; van Peij, Noel; Roubos, Johannes A.; Nielsen, Jens B.; Baker, Scott E.

    2011-06-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.

  9. Cloning and Genomic Organization of a Rhamnogalacturonase Gene from Locally Isolated Strain of Aspergillus niger.

    PubMed

    Damak, Naourez; Abdeljalil, Salma; Taeib, Noomen Hadj; Gargouri, Ali

    2015-08-01

    The rhg gene encoding a rhamnogalacturonase was isolated from the novel strain A1 of Aspergillus niger. It consists of an ORF of 1.505 kb encoding a putative protein of 446 amino acids with a predicted molecular mass of 47 kDa, belonging to the family 28 of glycosyl hydrolases. The nature and position of amino acids comprising the active site as well as the three-dimensional structure were well conserved between the A. niger CTM10548 and fungal rhamnogalacturonases. The coding region of the rhg gene is interrupted by three short introns of 56 (introns 1 and 3) and 52 (intron 2) bp in length. The comparison of the peptide sequence with A. niger rhg sequences revealed that the A1 rhg should be an endo-rhamnogalacturonases, more homologous to rhg A than rhg B A. niger known enzymes. The comparison of rhg nucleotide sequence from A. niger A1 with rhg A from A. niger shows several base changes. Most of these changes (59 %) are located at the third base of codons suggesting maintaining the same enzyme function. We used the rhamnogalacturonase A from Aspergillus aculeatus as a template to build a structural model of rhg A1 that adopted a right-handed parallel β-helix.

  10. Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

    PubMed

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.

  11. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

    PubMed

    Wiley, Laura K; Sivley, R Michael; Bush, William S

    2013-01-01

    Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist. PMID:23894185

  12. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

    PubMed

    Wiley, Laura K; Sivley, R Michael; Bush, William S

    2013-01-01

    Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

  13. Biosynthetic Pathway for the Epipolythiodioxopiperazine Acetylaranotin in Aspergillus terreus Revealed by Genome-based Deletion Analysis

    SciTech Connect

    Guo, Chun-Jun; Yeh, Hsu-Hua; Chiang, Yi Ming; Sanchez, James F.; Chang, ShuLin; Bruno, Kenneth S.; Wang, Clay C.

    2013-04-15

    Abstract Epipolythiodioxopiperazines (ETPs) are a class of fungal secondary metabolites derived from cyclic peptides. Acetylaranotin belongs to one structural subgroup of ETPs characterized by the presence of a seven-membered dihydrooxepine ring. Defining the genes involved in acetylaranotin biosynthesis should provide a means to increase production of these compounds and facilitate the engineering of second-generation molecules. The filamentous fungus Aspergillus terreus produces acetylaranotin and related natural products. Using targeted gene deletions, we have identified a cluster of 9 genes including one nonribosomal peptide synthase gene, ataP, that is required for acetylaranotin biosynthesis. Chemical analysis of the wild type and mutant strains enabled us to isolate seventeen natural products that are either intermediates in the normal biosynthetic pathway or shunt products that are produced when the pathway is interrupted through mutation. Nine of the compounds identified in this study are novel natural products. Our data allow us to propose a complete biosynthetic pathway for acetylaranotin and related natural products.

  14. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    ERIC Educational Resources Information Center

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  15. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  16. The MaizeGDB Genome Browser Tutorial: One example of database outreach to biologists via video

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At the Maize Genetics and Genomics Database (MaizeGDB), we have developed a number of video tutorials that aim to demonstrate how to use various tools as well as to explici...

  17. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Grigoriev, Igor V.; Baker, Scott E.; Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; Vondervoot, Peter J.I. van de; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristen F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; Dijck, Piet W.M. van; Hofmann, Gerald; Lasure, Linda L.; Magnusson, Jon K.; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; Ooyen, Albert J.J. van; Panther, Kathyrn S.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hen; Tsang, Adrian; Brink, Johannes M. van den; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Kubicek, Christian P.; Martinez, Diego; Peij, Noel N.M.E. van; Roubos, Johannes A.; Nielsen, Jens

    2011-04-28

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up-regulation of genes relevant to glucoamylase A production, such as tRNA-synthases and protein transporters. Our results and datasets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.[Supplemental materials (10 figures, three text documents and 16 tables) have been made available

  18. GBshape: a genome browser database for DNA shape annotations.

    PubMed

    Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J; Parker, Stephen C J; Nuzhdin, Sergey V; Tullius, Thomas D; Rohs, Remo

    2015-01-01

    Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species.

  19. The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video

    PubMed Central

    Harper, Lisa C.; Schaeffer, Mary L.; Thistle, Jordan; Gardiner, Jack M.; Andorf, Carson M.; Campbell, Darwin A.; Cannon, Ethalinda K.S.; Braun, Bremen L.; Birkett, Scott M.; Lawrence, Carolyn J.; Sen, Taner Z.

    2011-01-01

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At MaizeGDB, we have developed a number of video tutorials that demonstrate how to use various tools and explicitly outline the caveats researchers should know to interpret the information available to them. One such popular video currently available is ‘Using the MaizeGDB Genome Browser’, which describes how the maize genome was sequenced and assembled as well as how the sequence can be visualized and interacted with via the MaizeGDB Genome Browser. Database URL: http://www.maizegdb.org/ PMID:21565781

  20. Integrated pathway-genome databases and their role in drug discovery.

    PubMed

    Karp, P D; Krummenacker, M; Paley, S; Wagg, J

    1999-07-01

    Integrated pathway-genome databases describe the genes and genome of an organism, as well as its predicted pathways, reactions, enzymes and metabolites. In conjunction with visualization and analysis software, these databases provide a framework for improved understanding of microbial physiology and for antimicrobial drug discovery. We describe pathway-based analyses of the genomes of a number of medically relevant microorganisms and a novel software tool that visualizes gene-expression data on a diagram showing the whole metabolic network of the microorganism.

  1. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  2. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  3. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  4. Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals.

    PubMed

    Lee, Yong Seok; Oh, Jeongsu; Kim, Young Uk; Kim, Namchul; Yang, Sungjin; Hwang, Ui Wook

    2008-01-01

    Mitome is a specialized mitochondrial genome database designed for easy comparative analysis of various features of metazoan mitochondrial genomes such as base frequency, A+T skew, codon usage and gene arrangement pattern. A particular function of the database is the automatic reconstruction of phylogenetic relationships among metazoans selected by a user from a taxonomic tree menu based on nucleotide sequences, amino acid sequences or gene arrangement patterns. Mitome also enables us (i) to easily find the taxonomic positions of organisms of which complete mitochondrial genome sequences are publicly available; (ii) to acquire various metazoan mitochondrial genome characteristics through a graphical genome browser; (iii) to search for homology patterns in mitochondrial gene arrangements; (iv) to download nucleotide or amino acid sequences not only of an entire mitochondrial genome but also of each component; and (v) to find interesting references easily through links with PubMed. In order to provide users with a dynamic, responsive, interactive and faster web database, Mitome is constructed using two recently highlighted techniques, Ajax (Asynchronous JavaScript and XML) and Web Services. Mitome has the potential to become very useful in the fields of molecular phylogenetics and evolution and comparative organelle genomics. The database is available at: http://www.mitome.info.

  5. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    PubMed Central

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  6. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

  7. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  8. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  9. MicrobeDB: a locally maintainable database of microbial genomic sequences

    PubMed Central

    Langille, Morgan G. I.; Laird, Matthew R.; Hsiao, William W. L.; Chiu, Terry A.; Eisen, Jonathan A.; Brinkman, Fiona S. L.

    2012-01-01

    Summary: Analysis of microbial genomes often requires the general organization and comparison of tens to thousands of genomes both from public repositories and unpublished sources. MicrobeDB provides a foundation for such projects by the automation of downloading published, completed bacterial and archaeal genomes from key sources, parsing annotations of all genomes (both public and private) into a local database, and allowing interaction with the database through an easy to use programming interface. MicrobeDB creates a simple to use, easy to maintain, centralized local resource for various large-scale comparative genomic analyses and a back-end for future microbial application design. Availability: MicrobeDB is freely available under the GNU-GPL at: http://github.com/mlangill/microbedb/ Contact: morgan.g.i.langille@gmail.com PMID:22576174

  10. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  11. Databases and web tools for cancer genomics study.

    PubMed

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591

  12. PGSB PlantsDB: updates to the database framework for comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C.; Martis, Mihaela M.; Seidel, Michael; Kugler, Karl G.; Gundlach, Heidrun; Mayer, Klaus F.X.

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  13. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata.

    PubMed

    Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C

    2008-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence' (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/

  14. Databases and information integration for the Medicago truncatula genome and transcriptome.

    PubMed

    Cannon, Steven B; Crow, John A; Heuer, Michael L; Wang, Xiaohong; Cannon, Ethalinda K S; Dwan, Christopher; Lamblin, Anne-Francoise; Vasdewani, Jayprakash; Mudge, Joann; Cook, Andrew; Gish, John; Cheung, Foo; Kenton, Steve; Kunau, Timothy M; Brown, Douglas; May, Gregory D; Kim, Dongjin; Cook, Douglas R; Roe, Bruce A; Town, Chris D; Young, Nevin D; Retzel, Ernest F

    2005-05-01

    An international consortium is sequencing the euchromatic genespace of Medicago truncatula. Extensive bioinformatic and database resources support the marker-anchored bacterial artificial chromosome (BAC) sequencing strategy. Existing physical and genetic maps and deep BAC-end sequencing help to guide the sequencing effort, while EST databases provide essential resources for genome annotation as well as transcriptome characterization and microarray design. Finished BAC sequences are joined into overlapping sequence assemblies and undergo an automated annotation process that integrates ab initio predictions with EST, protein, and other recognizable features. Because of the sequencing project's international and collaborative nature, data production, storage, and visualization tools are broadly distributed. This paper describes databases and Web resources for the project, which provide support for physical and genetic maps, genome sequence assembly, gene prediction, and integration of EST data. A central project Web site at medicago.org/genome provides access to genome viewers and other resources project-wide, including an Ensembl implementation at medicago.org, physical map and marker resources at mtgenome.ucdavis.edu, and genome viewers at the University of Oklahoma (www.genome.ou.edu), the Institute for Genomic Research (www.tigr.org), and Munich Information for Protein Sequences Center (mips.gsf.de). PMID:15888676

  15. BambooGDB: a bamboo genome database with functional annotation and an analysis platform

    PubMed Central

    Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

    2014-01-01

    Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein–protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org PMID:24602877

  16. Databases, models, and algorithms for functional genomics: a bioinformatics perspective.

    PubMed

    Singh, Gautam B; Singh, Harkirat

    2005-02-01

    A variety of patterns have been observed on the DNA and protein sequences that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns discovered on biological sequences, they are generally cataloged and maintained within internationally shared databases. Furthermore,the variability in a family of observed patterns is often represented using computational models in order to facilitate their search within an uncharacterized biological sequence. As the biological data is comprised of a mosaic of sequence-levels motifs, it is significant to unravel the synergies of macromolecular coordination utilized in cell-specific differential synthesis of proteins. This article provides an overview of the various pattern representation methodologies and the surveys the pattern databases available for use to the molecular biologists. Our aim is to describe the principles behind the computational modeling and analysis techniques utilized in bioinformatics research, with the objective of providing insight necessary to better understand and effectively utilize the available databases and analysis tools. We also provide a detailed review of DNA sequence level patterns responsible for structural conformations within the Scaffold or Matrix Attachment Regions (S/MARs).

  17. xBASE, a collection of online databases for bacterial comparative genomics.

    PubMed

    Chaudhuri, Roy R; Pallen, Mark J

    2006-01-01

    The schema of the previously described Escherischia coli database coliBASE has been applied to a number of other bacterial taxa, under the collective name xBASE. The new databases include CampyDB for Campylobacter, Helicobacter and Wolinella; PseudoDB for pseudomonads; ClostriDB for clostridia; RhizoDB for Rhizobium and Sinorhizobium; and MycoDB, for Mycobacterium, Streptomyces and related organisms. The databases provide user friendly access to annotation and genome comparisons through a web-based graphical interface. Newly developed features include whole genome displays, 'painting' of genes according to properties such as GC content, a pattern search system to identify conserved motifs and batch BLAST searching of every protein encoded by a region. Examples of how the databases have been, and continue to be, used to generate hypotheses for subsequent laboratory investigation are presented. xBASE is available online at http://xbase.bham.ac.uk. PMID:16381881

  18. A Ruby API to query the Ensembl database for genomic features

    PubMed Central

    Strozzi, Francesco; Aerts, Jan

    2011-01-01

    Summary: The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Availability and Implementation: Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api. Contact: jan.aerts@esat.kuleuven.be PMID:21278190

  19. The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species.

    PubMed

    Kim, Namshin; Alekseyenko, Alexander V; Roy, Meenakshi; Lee, Christopher

    2007-01-01

    We have greatly expanded the Alternative Splicing Annotation Project (ASAP) database: (i) its human alternative splicing data are expanded approximately 3-fold over the previous ASAP database, to nearly 90,000 distinct alternative splicing events; (ii) it now provides genome-wide alternative splicing analyses for 15 vertebrate, insect and other animal species; (iii) it provides comprehensive comparative genomics information for comparing alternative splicing and splice site conservation across 17 aligned genomes, based on UCSC multigenome alignments; (iv) it provides an approximately 2- to 3-fold expansion in detection of tissue-specific alternative splicing events, and of cancer versus normal specific alternative splicing events. We have also constructed a novel database linking orthologous exons and orthologous introns between genomes, based on multigenome alignment of 17 animal species. It can be a valuable resource for studies of gene structure evolution. ASAP II provides a new web interface enabling more detailed exploration of the data, and integrating comparative genomics information with alternative splicing data. We provide a set of tools for advanced data-mining of ASAP II with Pygr (the Python Graph Database Framework for Bioinformatics) including powerful features such as graph query, multigenome alignment query, etc. ASAP II is available at http://www.bioinformatics.ucla.edu/ASAP2.

  20. Choosing a genome browser for a Model Organism Database: surveying the Maize community

    PubMed Central

    Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.

    2010-01-01

    As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860

  1. VibrioBase: a model for next-generation genome and annotation database development.

    PubMed

    Choo, Siew Woh; Heydari, Hamed; Tan, Tze King; Siow, Cheuk Chuen; Beh, Ching Yew; Wee, Wei Yee; Mutha, Naresh V R; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam

    2014-01-01

    To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.

  2. Choosing a genome browser for a Model Organism Database: surveying the maize community.

    PubMed

    Sen, Taner Z; Harper, Lisa C; Schaeffer, Mary L; Andorf, Carson M; Seigfried, Trent E; Campbell, Darwin A; Lawrence, Carolyn J

    2010-01-01

    As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers' needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers' needs. Here, we document the survey's outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/

  3. coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics.

    PubMed

    Chaudhuri, Roy R; Khan, Arshad M; Pallen, Mark J

    2004-01-01

    We have constructed coliBASE, a database for Escherichia coli, Shigella and Salmonella comparative genomics available online at http://colibase. bham.ac.uk. Unlike other E.coli databases, which focus on the laboratory model strain K12, coliBASE is intended to reflect the full diversity of E.coli and its relatives. The database contains comparative data including whole genome alignments and lists of putative orthologous genes, together with numerous analytical tools and links to existing online resources. The data are stored in a relational database, accessible by a number of user-friendly search methods and graphical browsers. The database schema is generic and can easily be applied to other bacterial genomes. Two such databases, CampyDB (for the analysis of Campylobacter spp.) and ClostriDB (for Clostridium spp.) are also available at http://campy.bham.ac.uk and http://clostri. bham.ac.uk, respectively. An example of the power of E.coli comparative analyses such as those available through coliBASE is presented. PMID:14681417

  4. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    PubMed

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/.

  5. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes

    PubMed Central

    Bhawna; Bonthala, V.S.; Gajula, MNV Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131

  6. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    PubMed

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/. PMID:27465131

  7. A novel mycovirus from Aspergillus fumigatus contains four unique dsRNAs as its genome and is infectious as dsRNA

    PubMed Central

    Kanhayuwa, Lakkhana; Kotta-Loizou, Ioly; Özkan, Selin; Gunning, A. Patrick; Coutts, Robert H. A.

    2015-01-01

    We report the discovery and characterization of a double-stranded RNA (dsRNA) mycovirus isolated from the human pathogenic fungus Aspergillus fumigatus, Aspergillus fumigatus tetramycovirus-1 (AfuTmV-1), which reveals several unique features not found previously in positive-strand RNA viruses, including the fact that it represents the first dsRNA (to our knowledge) that is not only infectious as a purified entity but also as a naked dsRNA. The AfuTmV-1 genome consists of four capped dsRNAs, the largest of which encodes an RNA-dependent RNA polymerase (RdRP) containing a unique GDNQ motif normally characteristic of negative-strand RNA viruses. The third largest dsRNA encodes an S-adenosyl methionine–dependent methyltransferase capping enzyme and the smallest dsRNA a P-A-S–rich protein that apparently coats but does not encapsidate the viral genome as visualized by atomic force microscopy. A combination of a capping enzyme with a picorna-like RdRP in the AfuTmV-1 genome is a striking case of chimerism and the first example (to our knowledge) of such a phenomenon. AfuTmV-1 appears to be intermediate between dsRNA and positive-strand ssRNA viruses, as well as between encapsidated and capsidless RNA viruses. PMID:26139522

  8. Genome Shuffling of Mangrove Endophytic Aspergillus luchuensis MERV10 for Improving the Cholesterol-Lowering Agent Lovastatin under Solid State Fermentation

    PubMed Central

    El-Gendy, Mervat Morsy Abbas Ahmed; Al-Zahrani, Hind A. A.

    2016-01-01

    In the screening of marine mangrove derived fungi for lovastatin productivity, endophytic Aspergillus luchuensis MERV10 exhibited the highest lovastatin productivity (9.5 mg/gds) in solid state fermentation (SSF) using rice bran. Aspergillus luchuensis MERV10 was used as the parental strain in which to induce genetic variabilities after application of different mixtures as well as doses of mutagens followed by three successive rounds of genome shuffling. Four potent mutants, UN6, UN28, NE11, and NE23, with lovastatin productivity equal to 2.0-, 2.11-, 1.95-, and 2.11-fold higher than the parental strain, respectively, were applied for three rounds of genome shuffling as the initial mutants. Four hereditarily stable recombinants (F3/3, F3/7, F3/9, and F3/13) were obtained with lovastatin productivity equal to 50.8, 57.0, 49.7, and 51.0 mg/gds, respectively. Recombinant strain F3/7 yielded 57.0 mg/gds of lovastatin, which is 6-fold and 2.85-fold higher, respectively, than the initial parental strain and the highest mutants UN28 and NE23. It was therefore selected for the optimization of lovastatin production through improvement of SSF parameters. Lovastatin productivity was increased 32-fold through strain improvement methods, including mutations and three successive rounds of genome shuffling followed by optimizing SSF factors. PMID:27790068

  9. Short Interspersed Nuclear Element (SINE) Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293

    PubMed Central

    Kanhayuwa, Lakkhana; Coutts, Robert H. A.

    2016-01-01

    Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4–14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140–493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3’-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50–65% and 60–75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259–343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity. PMID:27736869

  10. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse.

    PubMed

    Blake, Judith A; Bult, Carol J; Eppig, Janan T; Kadin, James A; Richardson, Joel E

    2014-01-01

    The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is the community model organism database resource for the laboratory mouse, a premier animal model for the study of genetic and genomic systems relevant to human biology and disease. MGD maintains a comprehensive catalog of genes, functional RNAs and other genome features as well as heritable phenotypes and quantitative trait loci. The genome feature catalog is generated by the integration of computational and manual genome annotations generated by NCBI, Ensembl and Vega/HAVANA. MGD curates and maintains the comprehensive listing of functional annotations for mouse genes using the Gene Ontology, and MGD curates and integrates comprehensive phenotype annotations including associations of mouse models with human diseases. Recent improvements include integration of the latest mouse genome build (GRCm38), improved access to comparative and functional annotations for mouse genes with expanded representation of comparative vertebrate genomes and new loads of phenotype data from high-throughput phenotyping projects. All MGD resources are freely available to the research community.

  11. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

  12. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. PMID:27115625

  13. Development of a grape genomics database using IBM DB2 content manager software

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A relational database was created for the North American Grapevine Genome project at the Viticultural Research Center, at Florida A&M University. The collaborative project with USDA, ARS researchers is an important resource for viticulture production of new grapevine varieties which will be adapted ...

  14. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  15. The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

    ERIC Educational Resources Information Center

    Brown, Cecelia

    2003-01-01

    Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…

  16. BmTEdb: a collective database of transposable elements in the silkworm genome.

    PubMed

    Xu, Hong-En; Zhang, Hua-Hao; Xia, Tian; Han, Min-Jin; Shen, Yi-Hong; Zhang, Ze

    2013-01-01

    The silkworm, Bombyx mori, is one of the major insect model organisms, and its draft and fine genome sequences became available in 2004 and 2008, respectively. Transposable elements (TEs) constitute ~40% of the silkworm genome. To better understand the roles of TEs in organization, structure and evolution of the silkworm genome, we used a combination of de novo, structure-based and homology-based approaches for identification of the silkworm TEs and identified 1308 silkworm TE families. These TE families and their classification information were organized into a comprehensive and easy-to-use web-based database, BmTEdb. Users are entitled to browse, search and download the sequences in the database. Sequence analyses such as BLAST, HMMER and EMBOSS GetORF were also provided in BmTEdb. This database will facilitate studies for the silkworm genomics, the TE functions in the silkworm and the comparative analysis of the insect TEs. Database URL: http://gene.cqu.edu.cn/BmTEdb/.

  17. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    PubMed

    Verma, Mohit; Kumar, Vinay; Patel, Ravi K; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology) search and comparative gene expression analysis. The current release of CTDB (v2.0) hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types) and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms) between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html. PMID:26322998

  18. The Genome Sequence DataBase (GSDB): improving data quality and data access.

    PubMed Central

    Harger, C; Skupski, M; Bingham, J; Farmer, A; Hoisie, S; Hraber, P; Kiphart, D; Krakowski, L; McLeod, M; Schwertfeger, J; Seluja, G; Siepel, A; Singh, G; Stamper, D; Steadman, P; Thayer, N; Thompson, R; Wargo, P; Waugh, M; Zhuang, J J; Schad, P A

    1998-01-01

    In 1997 the primary focus of the Genome Sequence DataBase (GSDB; www. ncgr.org/gsdb ) located at the National Center for Genome Resources was to improve data quality and accessibility. Efforts to increase the quality of data within the database included two major projects; one to identify and remove all vector contamination from sequences in the database and one to create premier sequence sets (including both alignments and discontiguous sequences). Data accessibility was improved during the course of the last year in several ways. First, a graphical database sequence viewer was made available to researchers. Second, an update process was implemented for the web-based query tool, Maestro. Third, a web-based tool, Excerpt, was developed to retrieve selected regions of any sequence in the database. And lastly, a GSDB flatfile that contains annotation unique to GSDB (e.g., sequence analysis and alignment data) was developed. Additionally, the GSDB web site provides a tool for the detection of matrix attachment regions (MARs), which can be used to identify regions of high coding potential. The ultimate goal of this work is to make GSDB a more useful resource for genomic comparison studies and gene level studies by improving data quality and by providing data access capabilities that are consistent with the needs of both types of studies. PMID:9399793

  19. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    PubMed

    Verma, Mohit; Kumar, Vinay; Patel, Ravi K; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology) search and comparative gene expression analysis. The current release of CTDB (v2.0) hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types) and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms) between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.

  20. Sentra : a database of signal transduction proteins for comparative genome analysis.

    SciTech Connect

    D'Souza, M.; Glass, E. M.; Syed, M. H.; Zhang, Y.; Rodriguez, A.; Maltsev, N.; Galerpin, M. Y.; Mathematics and Computer Science; Univ. of Chicago; NIH

    2007-01-01

    Sentra (http://compbio.mcs.anl.gov/sentra), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.

  1. Integrated Database And Knowledge Base For Genomic Prospective Cohort Study In Tohoku Medical Megabank Toward Personalized Prevention And Medicine.

    PubMed

    Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun

    2015-01-01

    The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.

  2. Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

    PubMed Central

    Buchan, Daniel W.A.; Shepherd, Adrian J.; Lee, David; Pearl, Frances M.G.; Rison, Stuart C.G.; Thornton, Janet M.; Orengo, Christine A.

    2002-01-01

    We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies. PMID:11875040

  3. The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology.

    PubMed

    Eppig, Janan T; Bult, Carol J; Kadin, James A; Richardson, Joel E; Blake, Judith A; Anagnostopoulos, A; Baldarelli, R M; Baya, M; Beal, J S; Bello, S M; Boddy, W J; Bradt, D W; Burkart, D L; Butler, N E; Campbell, J; Cassell, M A; Corbani, L E; Cousins, S L; Dahmen, D J; Dene, H; Diehl, A D; Drabkin, H J; Frazer, K S; Frost, P; Glass, L H; Goldsmith, C W; Grant, P L; Lennon-Pierce, M; Lewis, J; Lu, I; Maltais, L J; McAndrews-Hill, M; McClellan, L; Miers, D B; Miller, L A; Ni, L; Ormsby, J E; Qi, D; Reddy, T B K; Reed, D J; Richards-Smith, B; Shaw, D R; Sinclair, R; Smith, C L; Szauter, P; Walker, M B; Walton, D O; Washburn, L L; Witham, I T; Zhu, Y

    2005-01-01

    The Mouse Genome Database (MGD) forms the core of the Mouse Genome Informatics (MGI) system (http://www.informatics.jax.org), a model organism database resource for the laboratory mouse. MGD provides essential integration of experimental knowledge for the mouse system with information annotated from both literature and online sources. MGD curates and presents consensus and experimental data representations of genotype (sequence) through phenotype information, including highly detailed reports about genes and gene products. Primary foci of integration are through representations of relationships among genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse and to build and implement the data and semantic standards that are essential for comparative genome analysis. Recent improvements in MGD discussed here include the enhancement of phenotype resources, the re-development of the International Mouse Strain Resource, IMSR, the update of mammalian orthology datasets and the electronic publication of classic books in mouse genetics.

  4. Design and implementation of a database for Brucella melitensis genome annotation.

    PubMed

    De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

    2008-03-18

    The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.

  5. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.

    PubMed

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.

  6. CottonGen: a genomics, genetics and breeding database for cotton research

    PubMed Central

    Yu, Jing; Jung, Sook; Cheng, Chun-Huai; Ficklin, Stephen P.; Lee, Taein; Zheng, Ping; Jones, Don; Percy, Richard G.; Main, Dorrie

    2014-01-01

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. PMID:24203703

  7. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

    PubMed

    Liolios, Konstantinos; Chen, I-Min A; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M; Kyrpides, Nikos C

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/

  8. Large-Scale Phylogenetic Classification of Fungal Chitin Synthases and Identification of a Putative Cell-Wall Metabolism Gene Cluster in Aspergillus Genomes

    PubMed Central

    Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto

    2014-01-01

    The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134

  9. Chemodiversity in the genus Aspergillus.

    PubMed

    Frisvad, Jens C; Larsen, Thomas O

    2015-10-01

    Isolates of Aspergillus species are able to produce a large number of secondary metabolites. The profiles of biosynthetic families of secondary metabolites are species specific, whereas individual secondary metabolite families can occur in other species, even those phylogenetically and ecologically unrelated to Aspergillus. Furthermore, there is a high degree of chemo-consistency from isolate to isolate in a species even though certain metabolite gene clusters are silenced in some isolates. Genome sequencing projects have shown that the diversity of secondary metabolites is much larger in each species than previously thought. The potential of finding even further new bioactive drug candidates in Aspergillus is evident, despite the fact that many secondary metabolites have already been structure elucidated and chemotaxonomic studies have shown that many new secondary metabolites have yet to be characterized. The genus Aspergillus is cladistically holophyletic but phenotypically polythetic and very diverse and is associated to quite different sexual states. Following the one fungus one name system, the genus Aspergillus is restricted to a holophyletic clade that include the morphologically different genera Aspergillus, Dichotomomyces, Phialosimplex, Polypaecilum and Cristaspora. Secondary metabolites common between the subgenera and sections of Aspergillus are surprisingly few, but many metabolites are common to a majority of species within the sections. We call small molecule extrolites in the same biosynthetic family isoextrolites. However, it appears that secondary metabolites from one Aspergillus section have analogous metabolites in other sections (here also called heteroisoextrolites). In this review, we give a genus-wide overview of secondary metabolite production in Aspergillus species. Extrolites appear to have evolved because of ecological challenges rather than being inherited from ancestral species, at least when comparing the species in the different

  10. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    PubMed Central

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem. PMID:26777304

  11. RiceVarMap: a comprehensive database of rice genomic variations

    PubMed Central

    Zhao, Hu; Yao, Wen; Ouyang, Yidan; Yang, Wanneng; Wang, Gongwei; Lian, Xingming; Xing, Yongzhong; Chen, Lingling; Xie, Weibo

    2015-01-01

    Rice Variation Map (RiceVarMap, http:/ricevarmap.ncpgr.cn) is a database of rice genomic variations. The database provides comprehensive information of 6 551 358 single nucleotide polymorphisms (SNPs) and 1 214 627 insertions/deletions (INDELs) identified from sequencing data of 1479 rice accessions. The SNP genotypes of all accessions were imputed and evaluated, resulting in an overall missing data rate of 0.42% and an estimated accuracy greater than 99%. The SNP/INDEL genotypes of all accessions are available for online query and download. Users can search SNPs/INDELs by identifiers of the SNPs/INDELs, genomic regions, gene identifiers and keywords of gene annotation. Allele frequencies within various subpopulations and the effects of the variation that may alter the protein sequence of a gene are also listed for each SNP/INDEL. The database also provides geographical details and phenotype images for various rice accessions. In particular, the database provides tools to construct haplotype networks and design PCR-primers by taking into account surrounding known genomic variations. These data and tools are highly useful for exploring genetic variations and evolution studies of rice and other species. PMID:25274737

  12. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  13. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease.

  14. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

    PubMed Central

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033

  15. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  16. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    PubMed

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp.

  17. MEMOSys 2.0: an update of the bioinformatics database for genome-scale models and genomic data.

    PubMed

    Pabinger, Stephan; Snajder, Rene; Hardiman, Timo; Willi, Michaela; Dander, Andreas; Trajanoski, Zlatko

    2014-01-01

    The MEtabolic MOdel research and development System (MEMOSys) is a versatile database for the management, storage and development of genome-scale models (GEMs). Since its initial release, the database has undergone major improvements, and the new version introduces several new features. First, the novel concept of derived models allows users to create model hierarchies that automatically propagate modifications along their order. Second, all stored components can now be easily enhanced with additional annotations that can be directly extracted from a supplied Systems Biology Markup Language (SBML) file. Third, the web application has been substantially revised and now features new query mechanisms, an easy search system for reactions and new link-out services to publicly available databases. Fourth, the updated database now contains 20 publicly available models, which can be easily exported into standardized formats for further analysis. Fifth, MEMOSys 2.0 is now also available as a fully configured virtual image and can be found online at http://www.icbi.at/memosys and http://memoys.i-med.ac.at. Database URL: http://memosys.i-med.ac.at.

  18. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.

  19. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  20. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. PMID:27278816

  1. Genome Data from DOOR: a Database for prOkaryotic OpeRons

    DOE Data Explorer

    DOOR (Database of prOkaryotic OpeRons) is an operon database developed by Computational Systems Biology Lab (CSBL) at University of Georgia. Although the operons in the database are based on prediction, there are some unique features. These are: • A algorithm is consistently best at all aspects including sensitivity and specificity for both true positives and true negatives, and the overall accuracy reaches 90 percent. The prediction algorithm is based on this paper: P. Dam, V. Olman, K. Harris, Z. Su, Y. Xu., Operon prediction using both genome-specific and general genomic information, Nucleic Acids Res., 35(1):288-98, 2007 • DOOR provides one of the largest data sets of operon information available to the public. DOOR provides operons for 675 prokaryotic genomes. Although most of operons in DOOR are not verified by experiments, the creators are also trying to provide some limited literature information, which is extracted from ODB. They emphasize that if the users are looking for strictly experimentally verified operons, they should look into DBTBS and RegulonDB first. • Operons which include RNA genes, which are rarely seen in other operon databases especially for predicted operon databases • Defined the similarity scores between operons, which is based on weighted maximum matching between operons. Similar operon groups can be used to predict accurate orthologous genes,and their upstream regions can be used to find the consensus binding motifs. • Integration of two motif finding programs in the database: MEME and CUBIC. DOOR provides an Organism View for browsing, a gene search tool, an operon search tool, and the operon prediction interface.[Text taken and edited from http://csbl1.bmb.uga.edu/OperonDB/tutorial.php

  2. Biological database of images and genomes: tools for community annotations linking image and genomic information.

    PubMed

    Oberlin, Andrew T; Jurkovic, Dominika A; Balish, Mitchell F; Friedberg, Iddo

    2013-01-01

    Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype-genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management and web services. By using BioDIG to create websites, researchers and curators can rapidly annotate a large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of mycoplasmas. PMID:23550062

  3. SubtiList: the reference database for the Bacillus subtilis genome.

    PubMed

    Moszer, Ivan; Jones, Louis M; Moreira, Sandrine; Fabry, Cécilia; Danchin, Antoine

    2002-01-01

    SubtiList is the reference database dedicated to the genome of Bacillus subtilis 168, the paradigm of Gram-positive endospore-forming bacteria. Developed in the framework of the B.subtilis genome project, SubtiList provides a curated dataset of DNA and protein sequences, combined with the relevant annotations and functional assignments. Information about gene functions and products is continuously updated by linking relevant bibliographic references. Recently, sequence corrections arising from both systematic verifications and submissions by individual scientists were included in the reference genome sequence. SubtiList is based on a generic relational data schema and a World Wide Web interface developed for the handling of bacterial genomes, called GenoList. The World Wide Web interface was designed to allow users to easily browse through genome data and retrieve information according to common biological queries. SubtiList also provides more elaborate tools, such as pattern searching, which are tightly connected to the overall browsing system. SubtiList is accessible at http://genolist.pasteur.fr/SubtiList/. Similar bacterial databases are accessible at http://genolist.pasteur.fr/. PMID:11752255

  4. SorghumFDB: sorghum functional genomics database with multidimensional network analysis

    PubMed Central

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein–protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants. Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  5. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  6. Use of functional genomics to assess the impact of climate change on Aspergillus flavus and aflatoxin production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is an opportunistic pathogenic fungus that infects several crops of agricultural importance, among them, corn, cotton, and peanuts. Once established as a pathogen the fungus may secrete secondary metabolites commonly known as mycotoxins, that if consumed by humans or animals may r...

  7. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  8. EchoBASE: an integrated post-genomic database for Escherichia coli.

    PubMed

    Misra, Raju V; Horler, Richard S P; Reindl, Wolfgang; Goryanin, Igor I; Thomas, Gavin H

    2005-01-01

    EchoBASE (http://www.ecoli-york.org) is a relational database designed to contain and manipulate information from post-genomic experiments using the model bacterium Escherichia coli K-12. Its aim is to collate information from a wide range of sources to provide clues to the functions of the approximately 1500 gene products that have no confirmed cellular function. The database is built on an enhanced annotation of the updated genome sequence of strain MG1655 and the association of experimental data with the E.coli genes and their products. Experiments that can be held within EchoBASE include proteomics studies, microarray data, protein-protein interaction data, structural data and bioinformatics studies. EchoBASE also contains annotated information on 'orphan' enzyme activities from this microbe to aid characterization of the proteins that catalyse these elusive biochemical reactions. PMID:15608209

  9. A genome-wide microsatellite polymorphism database for the indica and japonica rice.

    PubMed

    Zhang, Zhonghua; Deng, Yajun; Tan, Jun; Hu, Songnian; Yu, Jun; Xue, Qingzhong

    2007-02-28

    Microsatellite (MS) polymorphism is an important source of genetic diversity, providing support for map-based cloning and molecular breeding. We have developed a new database that contains 52 845 polymorphic MS loci between indica and japonica, composed of ample Class II MS markers, and integrated 18 828 MS loci from IRGSP and genetic markers from RGP. Based on genetic marker positions on the rice genome (http://rise.genomics.org.cn/rice2/index.jsp ), we determined the approximate genetic distances of these MS loci and validated 100 randomly selected markers experimentally with 90% success rate. In addition, we recorded polymorphic MS positions in indica cv. 9311 that is the most important paternal parent of the two-line hybrid rice in China. Our database will undoubtedly facilitate the application of MS markers in genetic researches and marker-assisted breeding. The data set is freely available from www.wigs.zju.edu.cn/achievment/polySSR. PMID:17452422

  10. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  11. Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database.

    PubMed

    Drabkin, Harold J; Blake, Judith A

    2012-01-01

    The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported

  12. RPFdb: a database for genome wide information of translated mRNA generated from ribosome profiling

    PubMed Central

    Xie, Shang-Qian; Nie, Peng; Wang, Yan; Wang, Hongwei; Li, Hongyu; Yang, Zhilong; Liu, Yizhi; Ren, Jian; Xie, Zhi

    2016-01-01

    Translational control is crucial in the regulation of gene expression and deregulation of translation is associated with a wide range of cancers and human diseases. Ribosome profiling is a technique that provides genome wide information of mRNA in translation based on deep sequencing of ribosome protected mRNA fragments (RPF). RPFdb is a comprehensive resource for hosting, analyzing and visualizing RPF data, available at www.rpfdb.org or http://sysbio.sysu.edu.cn/rpfdb/index.html. The current version of database contains 777 samples from 82 studies in 8 species, processed and reanalyzed by a unified pipeline. There are two ways to query the database: by keywords of studies or by genes. The outputs are presented in three levels. (i) Study level: including meta information of studies and reprocessed data for gene expression of translated mRNAs; (ii) Sample level: including global perspective of translated mRNA and a list of the most translated mRNA of each sample from a study; (iii) Gene level: including normalized sequence counts of translated mRNA on different genomic location of a gene from multiple samples and studies. To explore rich information provided by RPF, RPFdb also provides a genome browser to query and visualize context-specific translated mRNA. Overall our database provides a simple way to search, analyze, compare, visualize and download RPF data sets. PMID:26433228

  13. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  14. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary

    2016-08-01

    Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. PMID:27287925

  15. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  16. Aspergillus parasiticus SU-1 genome sequence, predicted chromosome structure, and comparative gene expression under aflatoxin-inducing conditions: evidence that differential expression contributes to species phenotype.

    PubMed

    Linz, John E; Wee, Josephine; Roze, Ludmila V

    2014-08-01

    The filamentous fungi Aspergillus parasiticus and Aspergillus flavus produce the carcinogenic secondary metabolite aflatoxin on susceptible crops. These species differ in the quantity of aflatoxins B1, B2, G1, and G2 produced in culture, in the ability to produce the mycotoxin cyclopiazonic acid, and in morphology of mycelia and conidiospores. To understand the genetic basis for differences in biochemistry and morphology, we conducted next-generation sequence (NGS) analysis of the A. parasiticus strain SU-1 genome and comparative gene expression (RNA sequence analysis [RNA Seq]) analysis of A. parasiticus SU-1 and A. flavus strain NRRL 3357 (3357) grown under aflatoxin-inducing and -noninducing culture conditions. Although A. parasiticus SU-1 and A. flavus 3357 are highly similar in genome structure and gene organization, we observed differences in the presence of specific mycotoxin gene clusters and differential expression of specific mycotoxin genes and gene clusters that help explain differences in the type and quantity of mycotoxins synthesized. Using computer-aided analysis of secondary metabolite clusters (antiSMASH), we demonstrated that A. parasiticus SU-1 and A. flavus 3357 may carry up to 93 secondary metabolite gene clusters, and surprisingly, up to 10% of the genome appears to be dedicated to secondary metabolite synthesis. The data also suggest that fungus-specific zinc binuclear cluster (C6) transcription factors play an important role in regulation of secondary metabolite cluster expression. Finally, we identified uniquely expressed genes in A. parasiticus SU-1 that encode C6 transcription factors and genes involved in secondary metabolism and stress response/cellular defense. Future work will focus on these differentially expressed A. parasiticus SU-1 loci to reveal their role in determining distinct species characteristics. PMID:24951444

  17. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    PubMed Central

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  18. Human Ageing Genomic Resources: integrated databases and tools for the biology and genetics of ageing.

    PubMed

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology.

  19. The Generic Genome Browser: A Building Block for a Model Organism System Database

    PubMed Central

    Stein, Lincoln D.; Mungall, Christopher; Shu, ShengQiang; Caudy, Michael; Mangone, Marco; Day, Allen; Nickerson, Elizabeth; Stajich, Jason E.; Harris, Todd W.; Arva, Adrian; Lewis, Suzanna

    2002-01-01

    The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org. PMID:12368253

  20. SmedGD 2.0: The Schmidtea mediterranea genome database.

    PubMed

    Robb, Sofia M C; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2015-08-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next-generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea.

  1. Exploring novel candidate genes from the Mouse Genome Informatics database: Potential implications for avian migration research.

    PubMed

    Contina, Andrea; Bridge, Eli S; Kelly, Jeffrey F

    2016-07-01

    To search for genes associated with migratory phenotypes in songbirds, we selected candidate genes through annotations from the Mouse Genome Informatics database and assembled an extensive candidate-gene library. Then, we implemented a next-generation sequencing approach to obtain DNA sequences from the Painted Bunting genome. We focused on those sequences that were conserved across avian species and that aligned with candidate genes in our mouse library. We genotyped short sequence repeats from the following candidate genes: ADRA1d, ANKRD17, CISH and MYH7. We studied the possible correlations between allelic variations occurring in these novel candidate migration genes and avian migratory phenotypes available from the published literature. We found that allele variation at MYH7 correlated with a calculated index of speed of migration (km/day) across 11 species of songbirds. We highlight the potential of the Mouse Genome Informatics database in providing new candidate genes that might play a crucial role in regulating migration in birds and possibly in other taxa. Our research effort shows the benefits and limitations of working with extensive genomic datasets and offers a snapshot of the challenges related to cross-species validation in behavioral and molecular ecology studies.

  2. The MiST2 database: a comprehensive genomics resource on microbial signal transduction

    PubMed Central

    Ulrich, Luke E.; Zhulin, Igor B.

    2010-01-01

    The MiST2 database (http://mistdb.com) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antibiotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. These are identified by searching protein sequences for specific domain profiles that implicate a protein in signal transduction. Compared to the previous version of the database, MiST2 contains a host of new features and improvements including the following: draft genomes; extracytoplasmic function (ECF) sigma factor protein identification; enhanced classification of signaling proteins; novel, high-quality domain models for identifying histidine kinases and response regulators; neighboring two-component genes; gene cart; better search capabilities; enhanced taxonomy browser; advanced genome browser; and a modern, biologist-friendly web interface. MiST2 currently contains 966 complete and 157 draft bacterial and archaeal genomes, which collectively contain more than 245 000 signal transduction proteins. The majority (66%) of these are one-component systems, followed by two-component proteins (26%), chemotaxis (6%), and finally ECF factors (2%). PMID:19900966

  3. UCNEbase--a database of ultraconserved non-coding elements and genomic regulatory blocks.

    PubMed

    Dimitrieva, Slavica; Bucher, Philipp

    2013-01-01

    UCNEbase (http://ccg.vital-it.ch/UCNEbase) is a free, web-accessible information resource on the evolution and genomic organization of ultra-conserved non-coding elements (UCNEs). It currently covers 4351 such elements in 18 different species. The majority of UCNEs are supposed to be transcriptional regulators of key developmental genes. As most of them occur as clusters near potential target genes, the database is organized along two hierarchical levels: individual UCNEs and ultra-conserved genomic regulatory blocks (UGRBs). UCNEbase introduces a coherent nomenclature for UCNEs reflecting their respective associations with likely target genes. Orthologous and paralogous UCNEs share components of their names and are systematically cross-linked. Detailed synteny maps between the human and other genomes are provided for all UGRBs. UCNEbase is managed by a relational database system and can be accessed by a variety of web-based query pages. As it relies on the UCSC genome browser as visualization platform, a large part of its data content is also available as browser viewable custom track files. UCNEbase is potentially useful to any computational, experimental or evolutionary biologist interested in conserved non-coding DNA elements in vertebrates. PMID:23193254

  4. SmedGD 2.0: The Schmidtea mediterranea genome database

    PubMed Central

    Robb, Sofia M.C.; Gotting, Kirsten; Ross, Eric; Sánchez Alvarado, Alejandro

    2016-01-01

    Planarians have emerged as excellent models for the study of key biological processes such as stem cell function and regulation, axial polarity specification, regeneration, and tissue homeostasis among others. The most widely used organism for these studies is the free-living flatworm Schmidtea mediterranea. In 2007, the Schmidtea mediterranea Genome Database (SmedGD) was first released to provide a much needed resource for the small, but growing planarian community. SmedGD 1.0 has been a depository for genome sequence, a draft assembly, and related experimental data (e.g., RNAi phenotypes, in situ hybridization images, and differential gene expression results). We report here a comprehensive update to SmedGD (SmedGD 2.0) that aims to expand its role as an interactive community resource. The new database includes more recent, and up-to-date transcription data, provides tools that enhance interconnectivity between different genome assemblies and transcriptomes, including next generation assemblies for both the sexual and asexual biotypes of S. mediterranea. SmedGD 2.0 (http://smedgd.stowers.org) not only provides significantly improved gene annotations, but also tools for data sharing, attributes that will help both the planarian and biomedical communities to more efficiently mine the genomics and transcriptomics of S. mediterranea. PMID:26138588

  5. Exploring novel candidate genes from the Mouse Genome Informatics database: Potential implications for avian migration research.

    PubMed

    Contina, Andrea; Bridge, Eli S; Kelly, Jeffrey F

    2016-07-01

    To search for genes associated with migratory phenotypes in songbirds, we selected candidate genes through annotations from the Mouse Genome Informatics database and assembled an extensive candidate-gene library. Then, we implemented a next-generation sequencing approach to obtain DNA sequences from the Painted Bunting genome. We focused on those sequences that were conserved across avian species and that aligned with candidate genes in our mouse library. We genotyped short sequence repeats from the following candidate genes: ADRA1d, ANKRD17, CISH and MYH7. We studied the possible correlations between allelic variations occurring in these novel candidate migration genes and avian migratory phenotypes available from the published literature. We found that allele variation at MYH7 correlated with a calculated index of speed of migration (km/day) across 11 species of songbirds. We highlight the potential of the Mouse Genome Informatics database in providing new candidate genes that might play a crucial role in regulating migration in birds and possibly in other taxa. Our research effort shows the benefits and limitations of working with extensive genomic datasets and offers a snapshot of the challenges related to cross-species validation in behavioral and molecular ecology studies. PMID:27061206

  6. UCNEbase--a database of ultraconserved non-coding elements and genomic regulatory blocks.

    PubMed

    Dimitrieva, Slavica; Bucher, Philipp

    2013-01-01

    UCNEbase (http://ccg.vital-it.ch/UCNEbase) is a free, web-accessible information resource on the evolution and genomic organization of ultra-conserved non-coding elements (UCNEs). It currently covers 4351 such elements in 18 different species. The majority of UCNEs are supposed to be transcriptional regulators of key developmental genes. As most of them occur as clusters near potential target genes, the database is organized along two hierarchical levels: individual UCNEs and ultra-conserved genomic regulatory blocks (UGRBs). UCNEbase introduces a coherent nomenclature for UCNEs reflecting their respective associations with likely target genes. Orthologous and paralogous UCNEs share components of their names and are systematically cross-linked. Detailed synteny maps between the human and other genomes are provided for all UGRBs. UCNEbase is managed by a relational database system and can be accessed by a variety of web-based query pages. As it relies on the UCSC genome browser as visualization platform, a large part of its data content is also available as browser viewable custom track files. UCNEbase is potentially useful to any computational, experimental or evolutionary biologist interested in conserved non-coding DNA elements in vertebrates.

  7. PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database

    PubMed Central

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung

    2016-01-01

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. PMID:26903996

  8. PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.

    PubMed

    Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh

    2013-01-01

    Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/ PMID:23396298

  9. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    PubMed

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  10. SymbioGenomesDB: a database for the integration and access to knowledge on host–symbiont relationships

    PubMed Central

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic–host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  11. SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships.

    PubMed

    Reyes-Prieto, Mariana; Vargas-Chávez, Carlos; Latorre, Amparo; Moya, Andrés

    2015-01-01

    Symbiotic relationships occur naturally throughout the tree of life, either in a commensal, mutualistic or pathogenic manner. The genomes of multiple organisms involved in symbiosis are rapidly being sequenced and becoming available, especially those from the microbial world. Currently, there are numerous databases that offer information on specific organisms or models, but none offer a global understanding on relationships between organisms, their interactions and capabilities within their niche, as well as their role as part of a system, in this case, their role in symbiosis. We have developed the SymbioGenomesDB as a community database resource for laboratories which intend to investigate and use information on the genetics and the genomics of organisms involved in these relationships. The ultimate goal of SymbioGenomesDB is to host and support the growing and vast symbiotic-host relationship information, to uncover the genetic basis of such associations. SymbioGenomesDB maintains a comprehensive organization of information on genomes of symbionts from diverse hosts throughout the Tree of Life, including their sequences, their metadata and their genomic features. This catalog of relationships was generated using computational tools, custom R scripts and manual integration of data available in public literature. As a highly curated and comprehensive systems database, SymbioGenomesDB provides web access to all the information of symbiotic organisms, their features and links to the central database NCBI. Three different tools can be found within the database to explore symbiosis-related organisms, their genes and their genomes. Also, we offer an orthology search for one or multiple genes in one or multiple organisms within symbiotic relationships, and every table, graph and output file is downloadable and easy to parse for further analysis. The robust SymbioGenomesDB will be constantly updated to cope with all the data being generated and included in major

  12. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    SciTech Connect

    Novichkov, Pavel S.; Ratnere, Igor; Wolf, Yuri I.; Koonin, Eugene V.; Dubchak, Inna

    2009-07-23

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov.

  13. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes.

    PubMed

    Novichkov, Pavel S; Ratnere, Igor; Wolf, Yuri I; Koonin, Eugene V; Dubchak, Inna

    2009-01-01

    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov.

  14. Construction of an Ortholog Database Using the Semantic Web Technology for Integrative Analysis of Genomic Data

    PubMed Central

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis. PMID:25875762

  15. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    PubMed

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  16. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    PubMed

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis. PMID:25875762

  17. Databases of genomic variation and phenotypes: existing resources and future needs

    PubMed Central

    Johnston, Jennifer J.; Biesecker, Leslie G.

    2013-01-01

    Massively parallel sequencing (MPS) has become an important tool for identifying medically significant variants in both research and the clinic. Accurate variation and genotype–phenotype databases are critical in our ability to make sense of the vast amount of information that MPS generates. The purpose of this review is to summarize the state of the art of variation and genotype–phenotype databases, how they can be used, and opportunities to improve these resources. Our working assumption is that the objective of the clinical genomicist is to identify highly penetrant variants that could explain existing disease or predict disease risk for individual patients or research participants. We have detailed how current databases contribute to this goal providing frequency data, literature reviews and predictions of causation for individual variants. For variant annotation, databases vary greatly in their ease of use, the use of standard mutation nomenclature, the comprehensiveness of the variant cataloging and the degree of expert opinion. Ultimately, we need a dynamic and comprehensive reference database of medically important variants that is easily cross referenced to exome and genome sequence data and allows for an accumulation of expert opinion. PMID:23962721

  18. Rat Genome Database: a unique resource for rat, human, and mouse quantitative trait locus data.

    PubMed

    Nigam, Rajni; Laulederkind, Stanley J F; Hayman, G Thomas; Smith, Jennifer R; Wang, Shur-Jen; Lowry, Timothy F; Petri, Victoria; De Pons, Jeff; Tutaj, Marek; Liu, Weisong; Jayaraman, Pushkala; Munzenmaier, Diane H; Worthey, Elizabeth A; Dwinell, Melinda R; Shimoyama, Mary; Jacob, Howard J

    2013-09-16

    The rat has been widely used as a disease model in a laboratory setting, resulting in an abundance of genetic and phenotype data from a wide variety of studies. These data can be found at the Rat Genome Database (RGD, http://rgd.mcw.edu/), which provides a platform for researchers interested in linking genomic variations to phenotypes. Quantitative trait loci (QTLs) form one of the earliest and core datasets, allowing researchers to identify loci harboring genes associated with disease. These QTLs are not only important for those using the rat to identify genes and regions associated with disease, but also for cross-organism analyses of syntenic regions on the mouse and the human genomes to identify potential regions for study in these organisms. Currently, RGD has data on >1,900 rat QTLs that include details about the methods and animals used to determine the respective QTL along with the genomic positions and markers that define the region. RGD also curates human QTLs (>1,900) and houses>4,000 mouse QTLs (imported from Mouse Genome Informatics). Multiple ontologies are used to standardize traits, phenotypes, diseases, and experimental methods to facilitate queries, analyses, and cross-organism comparisons. QTLs are visualized in tools such as GBrowse and GViewer, with additional tools for analysis of gene sets within QTL regions. The QTL data at RGD provide valuable information for the study of mapped phenotypes and identification of candidate genes for disease associations.

  19. The phytophthora genome initiative database: informatics and analysis for distributed pathogenomic research.

    PubMed

    Waugh, M; Hraber, P; Weller, J; Wu, Y; Chen, G; Inman, J; Kiphart, D; Sobral, B

    2000-01-01

    The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi, with database access through the database access hyperlink.

  20. Developing genomic knowledge bases and databases to support clinical management: current perspectives.

    PubMed

    Huser, Vojtech; Sincan, Murat; Cimino, James J

    2014-01-01

    Personalized medicine, the ability to tailor diagnostic and treatment decisions for individual patients, is seen as the evolution of modern medicine. We characterize here the informatics resources available today or envisioned in the near future that can support clinical interpretation of genomic test results. We assume a clinical sequencing scenario (germline whole-exome sequencing) in which a clinical specialist, such as an endocrinologist, needs to tailor patient management decisions within his or her specialty (targeted findings) but relies on a genetic counselor to interpret off-target incidental findings. We characterize the genomic input data and list various types of knowledge bases that provide genomic knowledge for generating clinical decision support. We highlight the need for patient-level databases with detailed lifelong phenotype content in addition to genotype data and provide a list of recommendations for personalized medicine knowledge bases and databases. We conclude that no single knowledge base can currently support all aspects of personalized recommendations and that consolidation of several current resources into larger, more dynamic and collaborative knowledge bases may offer a future path forward.

  1. Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew.

    PubMed

    Fan, Yu; Yu, Dandan; Yao, Yong-Gang

    2014-11-21

    The tree shrew (Tupaia belangeri) is a small mammal with a close relationship to primates and it has been proposed as an alternative experimental animal to primates in biomedical research. The recent release of a high-quality Chinese tree shrew genome enables more researchers to use this species as the model animal in their studies. With the aim to making the access to an extensively annotated genome database straightforward and easy, we have created the Tree shrew Database (TreeshrewDB). This is a web-based platform that integrates the currently available data from the tree shrew genome, including an updated gene set, with a systematic functional annotation and a mRNA expression pattern. In addition, to assist with automatic gene sequence analysis, we have integrated the common programs Blast, Muscle, GBrowse, GeneWise and codeml, into TreeshrewDB. We have also developed a pipeline for the analysis of positive selection. The user-friendly interface of TreeshrewDB, which is available at http://www.treeshrewdb.org, will undoubtedly help in many areas of biological research into the tree shrew.

  2. A survey of locus-specific database curation. Human Genome Variation Society.

    PubMed

    Cotton, Richard G H; Phillips, Kate; Horaitis, Ourania

    2007-04-01

    It is widely accepted that curation of variation in genes is best performed by experts in those genes and their variation. However, obtaining funding for such variation is difficult even though up-to-date lists of variations in genes are essential for optimum delivery of genetic healthcare and for medical research. This study was undertaken to gather information on gene-specific databases (locus-specific databases) in an effort to understand their functioning, funding and needs. A questionnaire was sent to 125 curators and we received 47 responses. Individuals performed curation of up to 69 genes. The time curators spent curating was extremely variable. This ranged from 0 h per week up to 5 curators spending over 4 h per week. The funding required ranged from US$600 to US$45,000 per year. Most databases were stimulated by the Human Genome Organization-Mutation Database Initiative and used their guidelines. Many databases reported unpublished mutations, with all but one respondent reporting errors in the literature. Of the 13 who reported hit rates, 9 reported over 52,000 hits per year. On the basis of this, five recommendations were made to improve the curation of variation information, particularly that of mutations causing single-gene disorder: 1. A curator for each gene, who is an expert in it, should be identified or nominated. 2. Curation at a minimum of 2 h per week at US$2000 per gene per year should be encouraged. 3. Guidelines and custom software use should be encouraged to facilitate easy setup and curation. 4. Hits per week on the website should be recorded to allow the importance of the site to be illustrated for grant-giving purposes. 5. Published protocols should be followed in the establishment of locus-specific databases.

  3. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data.

    PubMed

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-09-18

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/.

  4. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data.

    PubMed

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-09-18

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. PMID:25990738

  5. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data

    PubMed Central

    Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

    2015-01-01

    Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. PMID:25990738

  6. REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

    PubMed

    Roberts, Richard J; Vincze, Tamas; Posfai, Janos; Macelis, Dana

    2010-01-01

    REBASE is a comprehensive database of information about restriction enzymes, DNA methyltransferases and related proteins involved in the biological process of restriction-modification (R-M). It contains fully referenced information about recognition and cleavage sites, isoschizomers, neoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. Experimentally characterized homing endonucleases are also included. The fastest growing segment of REBASE contains the putative R-M systems found in the sequence databases. Comprehensive descriptions of the R-M content of all fully sequenced genomes are available including summary schematics. The contents of REBASE may be browsed from the web (http://rebase.neb.com) and selected compilations can be downloaded by ftp (ftp.neb.com). Additionally, monthly updates can be requested via email.

  7. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    PubMed

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  8. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    PubMed

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods. PMID:25388105

  9. Overlap of the cancer genome atlas and the immune epitope database

    PubMed Central

    Sait, Shaimaa; Fawcett, Timothy; Blanck, George

    2016-01-01

    Mutant peptides resulting from cancer drivers or passenger mutations are expected to have the potential to serve as a basis for cancer vaccines. However, a number of parameters regulate vaccine-associated immunogenicity, including the suitability of a peptide for binding to an antigen-presenting molecule or antibody. In order to obtain a basic indication of the prospect of human cancer epitope identification via current database development strategies, an overlap of the mutant Homo sapiens epitopes listed on the Immune Epitope Database (IEDB) and the mutant peptides indicated by The Cancer Genome Atlas (TCGA) somatic mutation database was obtained. No putative TCGA mutant peptides were detected among the 8,890 14–18 amino acid (AA) IEDB peptides available. In total, 3 IEDB mutant epitopes that encompassed a TCGA mutant AA position, but did not overlap the exact position of the TCGA mutant AA, were detected. The results of the present analysis confirm that verification of certain aspects of cancer epitope function can be obtained via the continued and systematic expansion of databases representing human protein epitopes. However, the analysis also indicates that there is relatively limited systematic information available regarding antigen-presenting molecule epitopes and cancer-related mutant peptides. PMID:27703532

  10. Strategies to explore functional genomics data sets in NCBI's GEO database.

    PubMed

    Wilhite, Stephen E; Barrett, Tanya

    2012-01-01

    The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.

  11. The Maize Genetics and Genomics Database. The Community Resource for Access to Diverse Maize Data1

    PubMed Central

    Lawrence, Carolyn J.; Seigfried, Trent E.; Brendel, Volker

    2005-01-01

    The Maize Genetics and Genomics Database (MaizeGDB) serves the maize (Zea mays) research community by making a wealth of genetics and genomics data available through an intuitive Web-based interface. The goals of the MaizeGDB project are 3-fold: to provide a central repository for public maize information; to present the data through the MaizeGDB Web site in a way that recapitulates biological relationships; and to provide an array of computational tools that address biological questions in an easy-to-use manner at the site. In addition to these primary tasks, MaizeGDB team members also serve the community of maize geneticists by lending technical support for community activities, including the annual Maize Genetics Conference and various workshops, teaching researchers to use both the MaizeGDB Web site and Community Curation Tools, and engaging in collaboration with individual research groups to make their unique data types available through MaizeGDB. PMID:15888678

  12. Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases

    NASA Astrophysics Data System (ADS)

    Jain, Anubhav; Persson, Kristin A.; Ceder, Gerbrand

    2016-05-01

    Materials innovations enable new technological capabilities and drive major societal advancements but have historically required long and costly development cycles. The Materials Genome Initiative (MGI) aims to greatly reduce this time and cost. In this paper, we focus on data reuse in the MGI and, in particular, discuss the impact of three different computational databases based on density functional theory methods to the research community. We also discuss and provide recommendations on technical aspects of data reuse, outline remaining fundamental challenges, and present an outlook on the future of MGI's vision of data sharing.

  13. Mining the Plasmodium genome database to define organellar function: what does the apicoplast do?

    PubMed Central

    Roos, David S; Crawford, Michael J; Donald, Robert G K; Fraunholz, Martin; Harb, Omar S; He, Cynthia Y; Kissinger, Jessica C; Shaw, Michael K; Striepen, Boris

    2002-01-01

    Apicomplexan species constitute a diverse group of parasitic protozoa, which are responsible for a wide range of diseases in many organisms. Despite differences in the diseases they cause, these parasites share an underlying biology, from the genetic controls used to differentiate through the complex parasite life cycle, to the basic biochemical pathways employed for intracellular survival, to the distinctive cell biology necessary for host cell attachment and invasion. Different parasites lend themselves to the study of different aspects of parasite biology: Eimeria for biochemical studies, Toxoplasma for molecular genetic and cell biological investigation, etc. The Plasmodium falciparum Genome Project contributes the first large-scale genomic sequence for an apicomplexan parasite. The Plasmodium Genome Database (http://PlasmoDB.org) has been designed to permit individual investigators to ask their own questions, even prior to formal release of the reference P. falciparum genome sequence. As a case in point, PlasmoDB has been exploited to identify metabolic pathways associated with the apicomplexan plastid, or 'apicoplast' - an essential organelle derived by secondary endosymbiosis of an alga, and retention of the algal plastid. PMID:11839180

  14. Genomic clones of Aspergillus nidulans containing alcA, the structural gene for alcohol dehydrogenase and alcR, a regulatory gene for ethanol metabolism.

    PubMed

    Doy, C H; Pateman, J A; Olsen, J E; Kane, H J; Creaser, E H

    1985-04-01

    Our aim was to obtain from Aspergillus nidulans a genomic bank and then clone a region we expected from earlier genetic mapping to contain two closely linked genes, alcA, the structural gene for alcohol dehydrogenase (ADH) and alcR, a positive trans-acting regulatory gene for ethanol metabolism. The expression of alcA is repressed by carbon catabolites. A genomic restriction fragment characteristic of the alcA-alcR region was identified, cloned in pBR322, and used to select from a genomic bank in lambda EMBL3A three overlapping clones covering 24 kb of DNA. Southern genomic analysis of wild-type, alcA and alcR mutants showed that the mutants contained extra DNA at sites near the center of the cloned DNA and are close together, as expected for alcA and alcR. Transcription from the cloned DNA and hybridization with a clone carrying the Saccharomyces cerevisiae gene for ADHI (ADC1) are both confined to the alcA-alcR region. At least one of several species of mature mRNA is about 1 kb, the size required to code for ADH. For all species, carbon catabolite repression overrides control by induction. The overall characteristics of transcription, hybridization to ADC1 and earlier work suggest that alcA consists of a number of exons and/or that the alcA-alcR region represents a cluster of alcA-related genes or sequences.

  15. DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes.

    PubMed

    Gao, Feng; Luo, Hao; Zhang, Chun-Ting

    2013-01-01

    Replication of chromosomes is one of the central events in the cell cycle. Chromosome replication begins at specific sites, called origins of replication (oriCs), for all three domains of life. However, the origins of replication still remain unknown in a considerably large number of bacterial and archaeal genomes completely sequenced so far. The availability of increasing complete bacterial and archaeal genomes has created challenges and opportunities for identification of their oriCs in silico, as well as in vivo. Based on the Z-curve theory, we have developed a web-based system Ori-Finder to predict oriCs in bacterial genomes with high accuracy and reliability by taking advantage of comparative genomics, and the predicted oriC regions have been organized into an online database DoriC, which is publicly available at http://tubic.tju.edu.cn/doric/ since 2007. Five years after we constructed DoriC, the database has significant advances over the number of bacterial genomes, increasing about 4-fold. Additionally, oriC regions in archaeal genomes identified by in vivo experiments, as well as in silico analyses, have also been added to the database. Consequently, the latest release of DoriC contains oriCs for >1500 bacterial genomes and 81 archaeal genomes, respectively.

  16. PhenoMiner: quantitative phenotype curation at the rat genome database.

    PubMed

    Laulederkind, Stanley J F; Liu, Weisong; Smith, Jennifer R; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Petri, Victoria; Lowry, Timothy F; de Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2013-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses >40 000 rat gene records as well as human and mouse orthologs, >2000 rat and 1900 human quantitative trait loci (QTLs) records and >2900 rat strain records. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. Recently, a project was initiated at RGD to incorporate quantitative phenotype data for rat strains, in addition to the currently existing qualitative phenotype data for rat strains, QTLs and genes. A specialized curation tool was designed to generate manual annotations with up to six different ontologies/vocabularies used simultaneously to describe a single experimental value from the literature. Concurrently, three of those ontologies needed extensive addition of new terms to move the curation forward. The curation interface development, as well as ontology development, was an ongoing process during the early stages of the PhenoMiner curation project. Database URL: http://rgd.mcw.edu.

  17. OntoMate: a text-mining tool aiding curation at the Rat Genome Database.

    PubMed

    Liu, Weisong; Laulederkind, Stanley J F; Hayman, G Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.

  18. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  19. The mouse genome database (MGD): new features facilitating a model system.

    PubMed

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2007-01-01

    The mouse genome database (MGD, http://www.informatics.jax.org/), the international community database for mouse, provides access to extensive integrated data on the genetics, genomics and biology of the laboratory mouse. The mouse is an excellent and unique animal surrogate for studying normal development and disease processes in humans. Thus, MGD's primary goals are to facilitate the use of mouse models for studying human disease and enable the development of translational research hypotheses based on comparative genotype, phenotype and functional analyses. Core MGD data content includes gene characterization and functions, phenotype and disease model descriptions, DNA and protein sequence data, polymorphisms, gene mapping data and genome coordinates, and comparative gene data focused on mammals. Data are integrated from diverse sources, ranging from major resource centers to individual investigator laboratories and the scientific literature, using a combination of automated processes and expert human curation. MGD collaborates with the bioinformatics community on the development of data and semantic standards, and it incorporates key ontologies into the MGD annotation system, including the Gene Ontology (GO), the Mammalian Phenotype Ontology, and the Anatomical Dictionary for Mouse Development and the Adult Anatomy. MGD is the authoritative source for mouse nomenclature for genes, alleles, and mouse strains, and for GO annotations to mouse genes. MGD provides a unique platform for data mining and hypothesis generation where one can express complex queries simultaneously addressing phenotypic effects, biochemical function and process, sub-cellular location, expression, sequence, polymorphism and mapping data. Both web-based querying and computational access to data are provided. Recent improvements in MGD described here include the incorporation of single nucleotide polymorphism data and search tools, the addition of PIR gene superfamily classifications

  20. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions

    PubMed Central

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. PMID:26989155

  1. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. PMID:26989155

  2. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE PAGESBeta

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  3. InvFEST, a database integrating information of polymorphic inversions in the human genome.

    PubMed

    Martínez-Fundichely, Alexander; Casillas, Sònia; Egea, Raquel; Ràmia, Miquel; Barbadilla, Antonio; Pantano, Lorena; Puig, Marta; Cáceres, Mario

    2014-01-01

    The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.

  4. Genome-wide Mycobacterium tuberculosis variation (GMTV) database: a new tool for integrating sequence variations and epidemiology

    PubMed Central

    2014-01-01

    Background Tuberculosis (TB) poses a worldwide threat due to advancing multidrug-resistant strains and deadly co-infections with Human immunodeficiency virus. Today large amounts of Mycobacterium tuberculosis whole genome sequencing data are being assessed broadly and yet there exists no comprehensive online resource that connects M. tuberculosis genome variants with geographic origin, with drug resistance or with clinical outcome. Description Here we describe a broadly inclusive unifying Genome-wide Mycobacterium tuberculosis Variation (GMTV) database, (http://mtb.dobzhanskycenter.org) that catalogues genome variations of M. tuberculosis strains collected across Russia. GMTV contains a broad spectrum of data derived from different sources and related to M. tuberculosis molecular biology, epidemiology, TB clinical outcome, year and place of isolation, drug resistance profiles and displays the variants across the genome using a dedicated genome browser. GMTV database, which includes 1084 genomes and over 69,000 SNP or Indel variants, can be queried about M. tuberculosis genome variation and putative associations with drug resistance, geographical origin, and clinical stages and outcomes. Conclusions Implementation of GMTV tracks the pattern of changes of M. tuberculosis strains in different geographical areas, facilitates disease gene discoveries associated with drug resistance or different clinical sequelae, and automates comparative genomic analyses among M. tuberculosis strains. PMID:24767249

  5. Genome-wide analysis of the Zn(II)2Cys6 zinc cluster-encoding gene family in Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Proteins with a Zn(II)2Cys6 domain, Cys-X2-Cys-X6-Cys-X5-12-Cys-X2-Cys-X6-9-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overvie...

  6. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella

    PubMed Central

    2013-01-01

    Background The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). Description KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. Conclusions KONAGAbase provides DBM comprehensive transcriptomic

  7. Expanded microbial genome coverage and improved protein family annotation in the COG database

    PubMed Central

    Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

    2015-01-01

    Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the

  8. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications.

    PubMed

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U B; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world's first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of 'mono' repeat (76.82%) over 'di' repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of

  9. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications.

    PubMed

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U B; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world's first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of 'mono' repeat (76.82%) over 'di' repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of

  10. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications

    PubMed Central

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait

  11. TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data.

    PubMed

    Bouaoun, Liacine; Sonkin, Dmitriy; Ardin, Maude; Hollstein, Monica; Byrnes, Graham; Zavadil, Jiri; Olivier, Magali

    2016-09-01

    TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database (http://p53.iarc.fr) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA-binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database.

  12. TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data.

    PubMed

    Bouaoun, Liacine; Sonkin, Dmitriy; Ardin, Maude; Hollstein, Monica; Byrnes, Graham; Zavadil, Jiri; Olivier, Magali

    2016-09-01

    TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database (http://p53.iarc.fr) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA-binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database. PMID:27328919

  13. Complete mitochondrial genome database and standardized classification system for Canis lupus familiaris.

    PubMed

    Duleba, Anna; Skonieczna, Katarzyna; Bogdanowicz, Wiesław; Malyarchuk, Boris; Grzybowski, Tomasz

    2015-11-01

    To contribute to the complete mitogenome database of the species Canis lupus familiaris and shed more light on its origin, we have sequenced mitochondrial genomes of 120 modern dogs from worldwide populations. Together with all the previously published mitogenome sequences of acceptable quality, we have reconstructed a global phylogenetic tree of 555 C. l. familiaris mitogenomes and standardized haplogroup nomenclature. The phylogenetic tree presented here and available online at http://clf.mtdna.tree.cm.umk.pl/ could be further used by forensic and evolutionary geneticists as well cynologists, for data quality control and unambiguous haplogroup classification. Our in-depth phylogeographic analysis of all C. l. familiaris mitogenomes confirmed that domestic dogs may have originated in East Asia during the Mesolithic and Upper Paleolithic time periods and started to expand to other parts of the world during Neolithic times. PMID:26218982

  14. REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

    PubMed

    Roberts, Richard J; Vincze, Tamas; Posfai, Janos; Macelis, Dana

    2015-01-01

    REBASE is a comprehensive and fully curated database of information about the components of restriction-modification (RM) systems. It contains fully referenced information about recognition and cleavage sites for both restriction enzymes and methyltransferases as well as commercial availability, methylation sensitivity, crystal and sequence data. All genomes that are completely sequenced are analyzed for RM system components, and with the advent of PacBio sequencing, the recognition sequences of DNA methyltransferases (MTases) are appearing rapidly. Thus, Type I and Type III systems can now be characterized in terms of recognition specificity merely by DNA sequencing. The contents of REBASE may be browsed from the web http://rebase.neb.com and selected compilations can be downloaded by FTP (ftp.neb.com). Monthly updates are also available via email.

  15. Analysis of disease-associated objects at the Rat Genome Database.

    PubMed

    Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G T; Smith, Jennifer R; Petri, Victoria; Lowry, Timothy F; Nigam, Rajni; Dwinell, Melinda R; Worthey, Elizabeth A; Munzenmaier, Diane H; Shimoyama, Mary; Jacob, Howard J

    2013-01-01

    The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene-disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, 'regulation of programmed cell death' was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where 'lipid metabolic process' was the most enriched term. 'Cytosol' and 'nucleus' were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with 'nucleus' annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term-annotated gene list showed enrichment in physiologically related diseases. For example, the 'regulation of blood pressure' genes were enriched with cardiovascular disease annotations, and the 'lipid metabolic process' genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological diseases by combining 'G

  16. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.

    PubMed

    Caspi, Ron; Altman, Tomer; Billington, Richard; Dreher, Kate; Foerster, Hartmut; Fulcher, Carol A; Holland, Timothy A; Keseler, Ingrid M; Kothari, Anamika; Kubo, Aya; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D

    2014-01-01

    The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible database describing metabolic pathways and enzymes from all domains of life. MetaCyc pathways are experimentally determined, mostly small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains >2100 pathways derived from >37,000 publications, and is the largest curated collection of metabolic pathways currently available. BioCyc (BioCyc.org) is a collection of >3000 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems and pathway-hole fillers. Additions to BioCyc over the past 2 years include YeastCyc, a PGDB for Saccharomyces cerevisiae, and 891 new genomes from the Human Microbiome Project. The BioCyc Web site offers a variety of tools for querying and analysis of PGDBs, including Omics Viewers and tools for comparative analysis. New developments include atom mappings in reactions, a new representation of glycan degradation pathways, improved compound structure display, better coverage of enzyme kinetic data, enhancements of the Web Groups functionality, improvements to the Omics viewers, a new representation of the Enzyme Commission system and, for the desktop version of the software, the ability to save display states.

  17. Uncovering the Genome-Wide Transcriptional Responses of the Filamentous Fungus Aspergillus niger to Lignocellulose Using RNA Sequencing

    PubMed Central

    Gaddipati, Sanyasi; Kokolski, Matthew; Malla, Sunir; Blythe, Martin J.; Ibbett, Roger; Campbell, Maria; Liddell, Susan; Aboobaker, Aziz; Tucker, Gregory A.; Archer, David B.

    2012-01-01

    A key challenge in the production of second generation biofuels is the conversion of lignocellulosic substrates into fermentable sugars. Enzymes, particularly those from fungi, are a central part of this process, and many have been isolated and characterised. However, relatively little is known of how fungi respond to lignocellulose and produce the enzymes necessary for dis-assembly of plant biomass. We studied the physiological response of the fungus Aspergillus niger when exposed to wheat straw as a model lignocellulosic substrate. Using RNA sequencing we showed that, 24 hours after exposure to straw, gene expression of known and presumptive plant cell wall–degrading enzymes represents a huge investment for the cells (about 20% of the total mRNA). Our results also uncovered new esterases and surface interacting proteins that might form part of the fungal arsenal of enzymes for the degradation of plant biomass. Using transcription factor deletion mutants (xlnR and creA) to study the response to both lignocellulosic substrates and low carbon source concentrations, we showed that a subset of genes coding for degradative enzymes is induced by starvation. Our data support a model whereby this subset of enzymes plays a scouting role under starvation conditions, testing for available complex polysaccharides and liberating inducing sugars, that triggers the subsequent induction of the majority of hydrolases. We also showed that antisense transcripts are abundant and that their expression can be regulated by growth conditions. PMID:22912594

  18. Developments in FINDbase worldwide database for clinically relevant genomic variation allele frequencies.

    PubMed

    Papadopoulos, Petros; Viennas, Emmanouil; Gkantouna, Vassiliki; Pavlidis, Cristiana; Bartsakoulia, Marina; Ioannou, Zafeiria-Marina; Ratbi, Ilham; Sefiani, Abdelaziz; Tsaknakis, John; Poulas, Konstantinos; Tzimas, Giannis; Patrinos, George P

    2014-01-01

    FINDbase (http://www.findbase.org) aims to document frequencies of clinically relevant genomic variations, namely causative mutations and pharmacogenomic markers, worldwide. Each database record includes the population, ethnic group or geographical region, the disorder name and the related gene, accompanied by links to any related databases and the genetic variation together with its frequency in that population. Here, we report, in addition to the regular data content updates, significant developments in FINDbase, related to data visualization and querying, data submission, interrelation with other resources and a new module for genetic disease summaries. In particular, (i) we have developed new data visualization tools that facilitate data querying and comparison among different populations, (ii) we have generated a new FINDbase module, built around Microsoft's PivotViewer (http://www.getpivot.com) software, based on Microsoft Silverlight technology (http://www.silverlight.net), that includes 259 genetic disease summaries from five populations, systematically collected from the literature representing the documented genetic makeup of these populations and (iii) the implementation of a generic data submission tool for every module currently available in FINDbase.

  19. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component.

    PubMed

    Cherry, J Michael

    2015-12-01

    An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology.

  20. The Candida Genome Database: the new homology information page highlights protein similarity and phylogeny.

    PubMed

    Binkley, Jonathan; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Shah, Prachi; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2014-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The goal of CGD is to facilitate and accelerate research into Candida pathogenesis and biology. The CGD Web site is organized around Locus pages, which display information collected about individual genes. Locus pages have multiple tabs for accessing different types of information; the default Summary tab provides an overview of the gene name, aliases, phenotype and Gene Ontology curation, whereas other tabs display more in-depth information, including protein product details for coding genes, notes on changes to the sequence or structure of the gene and a comprehensive reference list. Here, in this update to previous NAR Database articles featuring CGD, we describe a new tab that we have added to the Locus page, entitled the Homology Information tab, which displays phylogeny and gene similarity information for each locus.

  1. CrAgDb--a database of annotated chaperone repertoire in archaeal genomes.

    PubMed

    Rani, Shikha; Srivastava, Abhishikha; Kumar, Manish; Goel, Manisha

    2016-03-01

    Chaperones are a diverse class of ubiquitous proteins that assist other cellular proteins in folding correctly and maintaining their native structure. Many different chaperones cooperate to constitute the 'proteostasis' machinery in the cells. It has been proposed earlier that archaeal organisms could be ideal model systems for deciphering the basic functioning of the 'protein folding machinery' in higher eukaryotes. Several chaperone families have been characterized in archaea over the years but mostly one protein at a time, making it difficult to decipher the composition and mechanistics of the protein folding system as a whole. In order to deal with these lacunae, we have developed a database of all archaeal chaperone proteins, CrAgDb (Chaperone repertoire in Archaeal genomes). The data have been presented in a systematic way with intuitive browse and search facilities for easy retrieval of information. Access to these curated datasets should expedite large-scale analysis of archaeal chaperone networks and significantly advance our understanding of operation and regulation of the protein folding machinery in archaea. Researchers could then translate this knowledge to comprehend the more complex protein folding pathways in eukaryotic systems. The database is freely available at http://14.139.227.92/mkumar/cragdb/. PMID:26862144

  2. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  3. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  4. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

    PubMed

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A; Keseler, Ingrid M; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S; Karp, Peter D

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46,000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service.

  5. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Billington, Richard; Ferrer, Luciana; Foerster, Hartmut; Fulcher, Carol A.; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Subhraveti, Pallavi; Weaver, Daniel S.; Karp, Peter D.

    2016-01-01

    The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46 000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service. PMID:26527732

  6. Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes.

    PubMed

    Karpinka, J Brad; Fortriede, Joshua D; Burns, Kevin A; James-Zorn, Christina; Ponferrada, Virgilio G; Lee, Jacqueline; Karimi, Kamran; Zorn, Aaron M; Vize, Peter D

    2015-01-01

    Xenbase (http://www.xenbase.org), the Xenopus frog model organism database, integrates a wide variety of data from this biomedical model genus. Two closely related species are represented: the allotetraploid Xenopus laevis that is widely used for microinjection and tissue explant-based protocols, and the diploid Xenopus tropicalis which is used for genetics and gene targeting. The two species are extremely similar and protocols, reagents and results from each species are often interchangeable. Xenbase imports, indexes, curates and manages data from both species; all of which are mapped via unique IDs and can be queried in either a species-specific or species agnostic manner. All our services have now migrated to a private cloud to achieve better performance and reliability. We have added new content, including providing full support for morpholino reagents, used to inhibit mRNA translation or splicing and binding to regulatory microRNAs. New genomes assembled by the JGI for both species and are displayed in Gbrowse and are also available for searches using BLAST. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has also greatly expanded image content for figures published in papers describing Xenopus research via PubMedCentral.

  7. Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes

    PubMed Central

    Karpinka, J. Brad; Fortriede, Joshua D.; Burns, Kevin A.; James-Zorn, Christina; Ponferrada, Virgilio G.; Lee, Jacqueline; Karimi, Kamran; Zorn, Aaron M.; Vize, Peter D.

    2015-01-01

    Xenbase (http://www.xenbase.org), the Xenopus frog model organism database, integrates a wide variety of data from this biomedical model genus. Two closely related species are represented: the allotetraploid Xenopus laevis that is widely used for microinjection and tissue explant-based protocols, and the diploid Xenopus tropicalis which is used for genetics and gene targeting. The two species are extremely similar and protocols, reagents and results from each species are often interchangeable. Xenbase imports, indexes, curates and manages data from both species; all of which are mapped via unique IDs and can be queried in either a species-specific or species agnostic manner. All our services have now migrated to a private cloud to achieve better performance and reliability. We have added new content, including providing full support for morpholino reagents, used to inhibit mRNA translation or splicing and binding to regulatory microRNAs. New genomes assembled by the JGI for both species and are displayed in Gbrowse and are also available for searches using BLAST. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has also greatly expanded image content for figures published in papers describing Xenopus research via PubMedCentral. PMID:25313157

  8. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi. PMID:15608275

  9. Aspergillus: sex and recombination.

    PubMed

    Varga, János; Szigeti, Gyöngyi; Baranyi, Nikolett; Kocsubé, Sándor; O'Gorman, Céline M; Dyer, Paul S

    2014-12-01

    The genus Aspergillus is one of the most widespread groups of fungi on Earth, comprised of about 300-350 species with very diverse lifestyles. Most species produce asexual propagula (conidia) on conidial heads. Despite their ubiquity, a sexual cycle has not yet been identified for most of the aspergilli. Where sexual reproduction is present, species exhibit either homothallic (self fertile) or heterothallic (obligate outcrossing) breeding systems. A parasexual cycle has also been described in some Aspergillus species. As in other fungi, sexual reproduction is governed by mating-type (MAT) genes, which determine sexual identity and are involved in regulating later stages of sexual development. Previous population genetic studies have indicated that some supposedly asexual aspergilli exhibit evidence of a recombining population structure, suggesting the presence of a cryptic sexual cycle. In addition, genome analyses have revealed networks of genes necessary for sexual reproduction in several Aspergillus species, again consistent with latent sexuality in these fungi. Knowledge of MAT gene presence has then successfully been applied to induce sexual reproduction between MAT1-1 and MAT1-2 isolates of certain supposedly asexual aspergilli. Recent progress in understanding the extent and significance of sexual reproduction is described here, with special emphasis on findings that are relevant to clinically important aspergilli.

  10. Aspergillus: sex and recombination.

    PubMed

    Varga, János; Szigeti, Gyöngyi; Baranyi, Nikolett; Kocsubé, Sándor; O'Gorman, Céline M; Dyer, Paul S

    2014-12-01

    The genus Aspergillus is one of the most widespread groups of fungi on Earth, comprised of about 300-350 species with very diverse lifestyles. Most species produce asexual propagula (conidia) on conidial heads. Despite their ubiquity, a sexual cycle has not yet been identified for most of the aspergilli. Where sexual reproduction is present, species exhibit either homothallic (self fertile) or heterothallic (obligate outcrossing) breeding systems. A parasexual cycle has also been described in some Aspergillus species. As in other fungi, sexual reproduction is governed by mating-type (MAT) genes, which determine sexual identity and are involved in regulating later stages of sexual development. Previous population genetic studies have indicated that some supposedly asexual aspergilli exhibit evidence of a recombining population structure, suggesting the presence of a cryptic sexual cycle. In addition, genome analyses have revealed networks of genes necessary for sexual reproduction in several Aspergillus species, again consistent with latent sexuality in these fungi. Knowledge of MAT gene presence has then successfully been applied to induce sexual reproduction between MAT1-1 and MAT1-2 isolates of certain supposedly asexual aspergilli. Recent progress in understanding the extent and significance of sexual reproduction is described here, with special emphasis on findings that are relevant to clinically important aspergilli. PMID:25118872

  11. Comparison of gene expression signatures of diamide, H2O2 and menadione exposed Aspergillus nidulans cultures – linking genome-wide transcriptional changes to cellular physiology

    PubMed Central

    Pócsi, István; Miskei, Márton; Karányi, Zsolt; Emri, Tamás; Ayoubi, Patricia; Pusztahelyi, Tünde; Balla, György; Prade, Rolf A

    2005-01-01

    Background In addition to their cytotoxic nature, reactive oxygen species (ROS) are also signal molecules in diverse cellular processes in eukaryotic organisms. Linking genome-wide transcriptional changes to cellular physiology in oxidative stress-exposed Aspergillus nidulans cultures provides the opportunity to estimate the sizes of peroxide (O22-), superoxide (O2•-) and glutathione/glutathione disulphide (GSH/GSSG) redox imbalance responses. Results Genome-wide transcriptional changes triggered by diamide, H2O2 and menadione in A. nidulans vegetative tissues were recorded using DNA microarrays containing 3533 unique PCR-amplified probes. Evaluation of LOESS-normalized data indicated that 2499 gene probes were affected by at least one stress-inducing agent. The stress induced by diamide and H2O2 were pulse-like, with recovery after 1 h exposure time while no recovery was observed with menadione. The distribution of stress-responsive gene probes among major physiological functional categories was approximately the same for each agent. The gene group sizes solely responsive to changes in intracellular O22-, O2•- concentrations or to GSH/GSSG redox imbalance were estimated at 7.7, 32.6 and 13.0 %, respectively. Gene groups responsive to diamide, H2O2 and menadione treatments and gene groups influenced by GSH/GSSG, O22- and O2•- were only partly overlapping with distinct enrichment profiles within functional categories. Changes in the GSH/GSSG redox state influenced expression of genes coding for PBS2 like MAPK kinase homologue, PSK2 kinase homologue, AtfA transcription factor, and many elements of ubiquitin tagging, cell division cycle regulators, translation machinery proteins, defense and stress proteins, transport proteins as well as many enzymes of the primary and secondary metabolisms. Meanwhile, a separate set of genes encoding transport proteins, CpcA and JlbA amino acid starvation-responsive transcription factors, and some elements of sexual development

  12. Characterization and fine localization of two new genes in Xq28 using the genomic sequence/EST database screening approach

    SciTech Connect

    Faranda, S.; Frattini, A.; Zucchi, I.

    1996-06-15

    Two new genes were identified and mapped by searching the EST databases with genomic sequences obtained from putative CpG islands of the rodent-human hybrid X3000. Previous mapping of these CpG islands in the proximity of the host cell factor (HCFC1) and GdX genes automatically localized these two new genes to Xq28 in the interval between the L1 cell adhesion molecule (L1CAM) and the glucose-6-phosphate dehydrogenase (G6PD) loci. Both genes are relatively short, contain an ORF of 261 and 105 amino acids, respectively, and are ubiquitously expressed. Combining sequencing of selected CpG islands, derived from hybrids containing small portions of the human genome, with an EST database search is an easy method of identifying and mapping new genes to specific regions of the genome. 17 refs., 4 figs.

  13. The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database

    PubMed Central

    Engel, Stacia R.; Cherry, J. Michael

    2013-01-01

    The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186

  14. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases

    PubMed Central

    Caspi, Ron; Altman, Tomer; Dreher, Kate; Fulcher, Carol A.; Subhraveti, Pallavi; Keseler, Ingrid M.; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A.; Ong, Quang; Paley, Suzanne; Pujar, Anuradha; Shearer, Alexander G.; Travers, Michael; Weerasinghe, Deepika; Zhang, Peifen; Karp, Peter D.

    2012-01-01

    The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups. PMID:22102576

  15. T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools

    PubMed Central

    Han, Na; Yu, Weiwen; Qiang, Yujun

    2016-01-01

    Type IV secretion system (T4SS) can mediate the passage of macromolecules across cellular membranes and is essential for virulent and genetic material exchange among bacterial species. The Type IV Secretion Project 2.0 (T4SP 2.0) database is an improved and extended version of the platform released in 2013 aimed at assisting with the detection of Type IV secretion systems (T4SS) in bacterial genomes. This advanced version provides users with web server tools for detecting the existence and variations of T4SS genes online. The new interface for the genome browser provides a user-friendly access to the most complete and accurate resource of T4SS gene information (e.g., gene number, name, type, position, sequence, related articles, and quick links to other webs). Currently, this online database includes T4SS information of 5239 bacterial strains. Conclusions. T4SS is one of the most versatile secretion systems necessary for the virulence and survival of bacteria and the secretion of protein and/or DNA substrates from a donor to a recipient cell. This database on virB/D genes of the T4SS system will help scientists worldwide to improve their knowledge on secretion systems and also identify potential pathogenic mechanisms of various microbial species. PMID:27738451

  16. Gene evolution and gene expression after whole genome duplication in fish: the PhyloFish database.

    PubMed

    Pasquier, Jeremy; Cabau, Cédric; Nguyen, Thaovi; Jouanno, Elodie; Severac, Dany; Braasch, Ingo; Journot, Laurent; Pontarotti, Pierre; Klopp, Christophe; Postlethwait, John H; Guiguen, Yann; Bobe, Julien

    2016-01-01

    With more than 30,000 species, ray-finned fish represent approximately half of vertebrates. The evolution of ray-finned fish was impacted by several whole genome duplication (WGD) events including a teleost-specific WGD event (TGD) that occurred at the root of the teleost lineage about 350 million years ago (Mya) and more recent WGD events in salmonids, carps, suckers and others. In plants and animals, WGD events are associated with adaptive radiations and evolutionary innovations. WGD-spurred innovation may be especially relevant in the case of teleost fish, which colonized a wide diversity of habitats on earth, including many extreme environments. Fish biodiversity, the use of fish models for human medicine and ecological studies, and the importance of fish in human nutrition, fuel an important need for the characterization of gene expression repertoires and corresponding evolutionary histories of ray-finned fish genes. To this aim, we performed transcriptome analyses and developed the PhyloFish database to provide (i) de novo assembled gene repertoires in 23 different ray-finned fish species including two holosteans (i.e. a group that diverged from teleosts before TGD) and 21 teleosts (including six salmonids), and (ii) gene expression levels in ten different tissues and organs (and embryos for many) in the same species. This resource was generated using a common deep RNA sequencing protocol to obtain the most exhaustive gene repertoire possible in each species that allows between-species comparisons to study the evolution of gene expression in different lineages. The PhyloFish database described here can be accessed and searched using RNAbrowse, a simple and efficient solution to give access to RNA-seq de novo assembled transcripts. PMID:27189481

  17. dbSUPER: a database of super-enhancers in mouse and human genome

    PubMed Central

    Khan, Aziz; Zhang, Xuegong

    2016-01-01

    Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities. PMID:26438538

  18. Database management research for the Human Genome Project. Final progress report for period: 02/01/99 - 06/14/00

    SciTech Connect

    Bult, Carol J.

    1999-11-01

    The MouseBLAST server allows researchers to search a sequence within mouse/rodent sequence databases to find matching sequences that may be associated with mouse genes. Query results may be linked to gene detail records in the Mouse Genome Database (MGD). Searches are performed using WU-BLAST 2.0. All sequence databases are updated on a weekly basis.

  19. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species.

    PubMed

    Childs, Kevin L; Konganti, Kranti; Buell, C Robin

    2012-01-01

    Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

  20. Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0.

    PubMed

    Pible, Olivier; Armengaud, Jean

    2015-10-01

    High-throughput shotgun metaproteomic approaches on environmental or medical microbiomes are producing huge amounts of tandem mass spectrometry data. These can be interpreted either with a general protein sequence database comprising tens of thousands of sequenced genomes or with a more customized database such as those obtained after metagenome sequencing of the DNA extracted from the same sample. However, not all entries in a nucleotide or protein sequence database are of equal quality and this can critically impact metaproteomic data interpretation. In this viewpoint article, we exemplify several key issues. First, either genome or transcriptome data interpretation due to inaccurate contig assembly and gene prediction may be erroneous, for its mitigation the metaproteogenomic strategies could have an interesting perspective. Errors in sample handling and taxonomical characterization may also be problematic. Cross-contamination of genome sequences is also underestimated while frequent. As a consequence of these structural errors regarding protein sequences and additional problems due to homology-based functional annotation of proteins, specific efforts for better interpretation of metaproteomic data are required. We propose the development of new bioinformatic pipelines devoted to detection and correction of errors and contaminations to improve the overall quality of sequence and taxonomy databases for metaproteomics. PMID:26038180

  1. Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0.

    PubMed

    Pible, Olivier; Armengaud, Jean

    2015-10-01

    High-throughput shotgun metaproteomic approaches on environmental or medical microbiomes are producing huge amounts of tandem mass spectrometry data. These can be interpreted either with a general protein sequence database comprising tens of thousands of sequenced genomes or with a more customized database such as those obtained after metagenome sequencing of the DNA extracted from the same sample. However, not all entries in a nucleotide or protein sequence database are of equal quality and this can critically impact metaproteomic data interpretation. In this viewpoint article, we exemplify several key issues. First, either genome or transcriptome data interpretation due to inaccurate contig assembly and gene prediction may be erroneous, for its mitigation the metaproteogenomic strategies could have an interesting perspective. Errors in sample handling and taxonomical characterization may also be problematic. Cross-contamination of genome sequences is also underestimated while frequent. As a consequence of these structural errors regarding protein sequences and additional problems due to homology-based functional annotation of proteins, specific efforts for better interpretation of metaproteomic data are required. We propose the development of new bioinformatic pipelines devoted to detection and correction of errors and contaminations to improve the overall quality of sequence and taxonomy databases for metaproteomics.

  2. FunCoup 3.0: database of genome-wide functional coupling networks.

    PubMed

    Schmitt, Thomas; Ogris, Christoph; Sonnhammer, Erik L L

    2014-01-01

    We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.

  3. Genome-Wide Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Identifies Candidate Gene Signatures in Response to Aflatoxin Producing Fungus Aspergillus flavus

    PubMed Central

    Bedre, Renesh; Rajasekaran, Kanniah; Mangu, Venkata Ramanarao; Sanchez Timm, Luis Eduardo; Bhatnagar, Deepak; Baisakh, Niranjan

    2015-01-01

    Aflatoxins are toxic and potent carcinogenic metabolites produced from the fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. United States federal regulations restrict the use of aflatoxin contaminated cottonseed at >20 ppb for animal feed. Several strategies have been proposed for controlling aflatoxin contamination, and much success has been achieved by the application of an atoxigenic strain of A. flavus in cotton, peanut and maize fields. Development of cultivars resistant to aflatoxin through overexpression of resistance associated genes and/or knocking down aflatoxin biosynthesis of A. flavus will be an effective strategy for controlling aflatoxin contamination in cotton. In this study, genome-wide transcriptome profiling was performed to identify differentially expressed genes in response to infection with both toxigenic and atoxigenic strains of A. flavus on cotton (Gossypium hirsutum L.) pericarp and seed. The genes involved in antifungal response, oxidative burst, transcription factors, defense signaling pathways and stress response were highly differentially expressed in pericarp and seed tissues in response to A. flavus infection. The cell-wall modifying genes and genes involved in the production of antimicrobial substances were more active in pericarp as compared to seed. The genes involved in auxin and cytokinin signaling were also induced. Most of the genes involved in defense response in cotton were highly induced in pericarp than in seed. The global gene expression analysis in response to fungal invasion in cotton will serve as a source for identifying biomarkers for breeding, potential candidate genes for transgenic manipulation, and will help in understanding complex plant-fungal interaction for future downstream research. PMID:26366857

  4. Genome-Wide Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Identifies Candidate Gene Signatures in Response to Aflatoxin Producing Fungus Aspergillus flavus.

    PubMed

    Bedre, Renesh; Rajasekaran, Kanniah; Mangu, Venkata Ramanarao; Sanchez Timm, Luis Eduardo; Bhatnagar, Deepak; Baisakh, Niranjan

    2015-01-01

    Aflatoxins are toxic and potent carcinogenic metabolites produced from the fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. United States federal regulations restrict the use of aflatoxin contaminated cottonseed at >20 ppb for animal feed. Several strategies have been proposed for controlling aflatoxin contamination, and much success has been achieved by the application of an atoxigenic strain of A. flavus in cotton, peanut and maize fields. Development of cultivars resistant to aflatoxin through overexpression of resistance associated genes and/or knocking down aflatoxin biosynthesis of A. flavus will be an effective strategy for controlling aflatoxin contamination in cotton. In this study, genome-wide transcriptome profiling was performed to identify differentially expressed genes in response to infection with both toxigenic and atoxigenic strains of A. flavus on cotton (Gossypium hirsutum L.) pericarp and seed. The genes involved in antifungal response, oxidative burst, transcription factors, defense signaling pathways and stress response were highly differentially expressed in pericarp and seed tissues in response to A. flavus infection. The cell-wall modifying genes and genes involved in the production of antimicrobial substances were more active in pericarp as compared to seed. The genes involved in auxin and cytokinin signaling were also induced. Most of the genes involved in defense response in cotton were highly induced in pericarp than in seed. The global gene expression analysis in response to fungal invasion in cotton will serve as a source for identifying biomarkers for breeding, potential candidate genes for transgenic manipulation, and will help in understanding complex plant-fungal interaction for future downstream research.

  5. Genome-Wide Transcriptome Analysis of Cotton (Gossypium hirsutum L.) Identifies Candidate Gene Signatures in Response to Aflatoxin Producing Fungus Aspergillus flavus.

    PubMed

    Bedre, Renesh; Rajasekaran, Kanniah; Mangu, Venkata Ramanarao; Sanchez Timm, Luis Eduardo; Bhatnagar, Deepak; Baisakh, Niranjan

    2015-01-01

    Aflatoxins are toxic and potent carcinogenic metabolites produced from the fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. United States federal regulations restrict the use of aflatoxin contaminated cottonseed at >20 ppb for animal feed. Several strategies have been proposed for controlling aflatoxin contamination, and much success has been achieved by the application of an atoxigenic strain of A. flavus in cotton, peanut and maize fields. Development of cultivars resistant to aflatoxin through overexpression of resistance associated genes and/or knocking down aflatoxin biosynthesis of A. flavus will be an effective strategy for controlling aflatoxin contamination in cotton. In this study, genome-wide transcriptome profiling was performed to identify differentially expressed genes in response to infection with both toxigenic and atoxigenic strains of A. flavus on cotton (Gossypium hirsutum L.) pericarp and seed. The genes involved in antifungal response, oxidative burst, transcription factors, defense signaling pathways and stress response were highly differentially expressed in pericarp and seed tissues in response to A. flavus infection. The cell-wall modifying genes and genes involved in the production of antimicrobial substances were more active in pericarp as compared to seed. The genes involved in auxin and cytokinin signaling were also induced. Most of the genes involved in defense response in cotton were highly induced in pericarp than in seed. The global gene expression analysis in response to fungal invasion in cotton will serve as a source for identifying biomarkers for breeding, potential candidate genes for transgenic manipulation, and will help in understanding complex plant-fungal interaction for future downstream research. PMID:26366857

  6. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

    PubMed Central

    2011-01-01

    Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets. PMID:22192575

  7. Genome-wide transcriptome analysis of cotton (Gossypium hirsutum L.) identifies candidate gene signatures in response to aflatoxin producing fungus Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic metabolites and potent carcinogen produced from asexual fungi Aspergillus flavus and A. parasiticus. Aflatoxins can contaminate cottonseed under conducive preharvest and postharvest conditions. U.S. federal regulations restrict the use of aflatoxin contaminated cottonseed at >20...

  8. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes.

    PubMed

    Chan, Patricia P; Lowe, Todd M

    2016-01-01

    Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/.

  9. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms

    PubMed Central

    Nagai, Yoko; Takahashi, Yasuko; Imanishi, Tadashi

    2015-01-01

    Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: ‘VarySysDB Disease Edition’ (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases. PMID:25361969

  10. Shanghai RAPESEED Database: a resource for functional genomics studies of seed development and fatty acid metabolism of Brassica.

    PubMed

    Wu, Guo-Zhang; Shi, Qiu-Ming; Niu, Ya; Xing, Mei-Qing; Xue, Hong-Wei

    2008-01-01

    The Shanghai RAPESEED Database (RAPESEED, http://rapeseed.plantsignal.cn/) was created to provide the solid platform for functional genomics studies of oilseed crops with the emphasis on seed development and fatty acid metabolism. The RAPESEED includes the resource of 8462 unique ESTs, of which 3526 clones are with full length cDNA; the expression profiles of 8095 genes and the Serial Analysis of Gene Expression (SAGE, 23,895 unique tags) and tag-to-gene data during seed development. In addition, a total of approximately 14,700 M3 mutant populations were generated by ethylmethanesulfonate (EMS) mutagenesis and related seed quality information was determined using the Foss NIR System. Further, the TILLING (Targeting Induced Local Lesions IN Genomes) platform was established based on the generated EMS mutant population. The relevant information was collected in RAPESEED database, which can be searched through keywords, nucleotide or protein sequences, or seed quality parameters, and downloaded.

  11. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database.

    PubMed

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome's content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called "Cynara cardunculus MicroSatellite DataBase" (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates.

  12. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database.

    PubMed

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome's content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called "Cynara cardunculus MicroSatellite DataBase" (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830

  13. BactPepDB: a database of predicted peptides from a exhaustive survey of complete prokaryote genomes

    PubMed Central

    Rey, Julien; Deschavanne, Patrick; Tuffery, Pierre

    2014-01-01

    With the recent progress in complete genome sequencing, mining the increasing amount of genomic information available should in theory provide the means to discover new classes of peptides. However, annotation pipelines often do not consider small reading frames likely to be expressed. BactPepDB, available online at http://bactpepdb.rpbs.univ-paris-diderot.fr, is a database that aims at providing an exhaustive re-annotation of all complete prokaryotic genomes—chromosomal and plasmid DNA—available in RefSeq for coding sequences ranging between 10 and 80 amino acids. The identified peptides are classified as (i) previously identified in RefSeq, (ii) entity-overlapping (intragenic) or intergenic, and (iii) potential pseudogenes—intergenic sequences corresponding to a portion of a previously annotated larger gene. Additional information is related to homologs within order, predicted signal sequence, transmembrane segments, disulfide bonds, secondary structure, and the existence of a related 3D structure in the Protein Databank. As a result, BactPepDB provides insights about candidate peptides, and provides information about their conservation, together with some of their expected biological/structural features. The BactPepDB interface allows to search for candidate peptides in the database, or to search for peptides similar to a query, according to the multiple properties predicted or related to genomic localization. Database URL: http://www.yeastgenome.org/ PMID:25377257

  14. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome

    PubMed Central

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A.

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314

  15. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser.

  16. Comparison of transcriptome technologies in the pathogenic fungus Aspergillus fumigatus reveals novel insights into the genome and MpkA dependent gene expression

    PubMed Central

    2012-01-01

    Background The filamentous fungus Aspergillus fumigatus has become the most important airborne fungal pathogen causing life-threatening infections in immuno-compromised patients. Recently developed high-throughput transcriptome and proteome technologies, such as microarrays, RNA deep-sequencing, and LC-MS/MS of peptide mixtures, are of enormous value for systematically investigating pathogenic organisms. In the field of infection biology, one of the priorities is to collect and standardise data, in order to generate datasets that can be used to investigate and compare pathways and gene responses involved in pathogenicity. The “omics” era provides a multitude of inputs that need to be integrated and assessed. We therefore evaluated the potential of paired-end mRNA-Seq for investigating the regulatory role of the central mitogen activated protein kinase (MpkA). This kinase is involved in the cell wall integrity signalling pathway of A. fumigatus and essential for maintaining an intact cell wall in response to stress. Results The comparison of the transcriptome and proteome of an A. fumigatus wild-type strain with an mpkA null mutant strain revealed that 70.4% of the genome was found to be expressed and that MpkA plays a significant role in the regulation of many genes involved in cell wall remodelling, oxidative stress and iron starvation response, and secondary metabolite biosynthesis. Moreover, absence of the mpkA gene also strongly affects the expression of genes involved in primary metabolism. The data were further processed to evaluate the potential of the mRNA-Seq technique. We comprehensively matched up our data to published transcriptome studies and were able to show an improved data comparability of mRNA-Seq experiments independently of the technique used. Analysis of transcriptome and proteome data revealed only a weak correlation between mRNA and protein abundance. Conclusions High-throughput analysis of MpkA-dependent gene expression confirmed many

  17. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.

    PubMed

    Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/. PMID:24163125

  18. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  19. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases

    PubMed Central

    Sanderson, Lacey-Anne; Ficklin, Stephen P.; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A.; Bett, Kirstin E.; Main, Dorrie

    2013-01-01

    Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/ PMID:24163125

  20. The FlyBase database of the Drosophila genome projects andcommunity literature

    SciTech Connect

    Gelbart, William; Bayraktaroglu, Leyla; Bettencourt, Brian; Campbell, Kathy; Crosby, Madeline; Emmert, David; Hradecky, Pavel; Huang,Yanmei; Letovsky, Stan; Matthews, Beverly; Russo, Susan; Schroeder,Andrew; Smutniak, Frank; Zhou, Pinglei; Zytkovicz, Mark; Ashburner,Michael; Drysdale, Rachel; de Grey, Aubrey; Foulger, Rebecca; Millburn,Gillian; Yamada, Chihiro; Kaufman, Thomas; Matthews, Kathy; Gilbert, Don; Grumbling, Gary; Strelets, Victor; Shemen, C.; Rubin, Gerald; Berman,Brian; Frise, Erwin; Gibson, Mark; Harris, Nomi; Kaminker, Josh; Lewis,Suzanna; Marshall, Brad; Misra, Sima; Mungall, Christopher; Prochnik,Simon; Richter, John; Smith, Christopher; Shu, ShengQiang; Tupy,Jonathan; Wiel, Colin

    2002-09-16

    FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.

  1. Fungal plant cell wall-degrading enzyme database: a platform for comparative and evolutionary genomics in fungi and Oomycetes

    PubMed Central

    2013-01-01

    Background Plant cell wall-degrading enzymes (PCWDEs) play significant roles throughout the fungal life including acquisition of nutrients and decomposition of plant cell walls. In addition, many of PCWDEs are also utilized by biofuel and pulp industries. In order to develop a comparative genomics platform focused in fungal PCWDEs and provide a resource for evolutionary studies, Fungal PCWDE Database (FPDB) is constructed (http://pcwde.riceblast.snu.ac.kr/). Results In order to archive fungal PCWDEs, 22 sequence profiles were constructed and searched on 328 genomes of fungi, Oomycetes, plants and animals. A total of 6,682 putative genes encoding PCWDEs were predicted, showing differential distribution by their life styles, host ranges and taxonomy. Genes known to be involved in fungal pathogenicity, including polygalacturonase (PG) and pectin lyase, were enriched in plant pathogens. Furthermore, crop pathogens had more PCWDEs than those of rot fungi, implying that the PCWDEs analysed in this study are more needed for invading plant hosts than wood-decaying processes. Evolutionary analysis of PGs in 34 selected genomes revealed that gene duplication and loss events were mainly driven by taxonomic divergence and partly contributed by those events in species-level, especially in plant pathogens. Conclusions The FPDB would provide a fungi-specialized genomics platform, a resource for evolutionary studies of PCWDE gene families and extended analysis option by implementing Favorite, which is a data exchange and analysis hub built in Comparative Fungal Genomics Platform (CFGP 2.0; http://cfgp.snu.ac.kr/). PMID:24564786

  2. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals.

    PubMed

    Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent

    2014-07-01

    Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/.

  3. Conserved Secondary Structures in Aspergillus

    PubMed Central

    McGuire, Abigail Manson; Galagan, James E.

    2008-01-01

    Background Recent evidence suggests that the number and variety of functional RNAs (ncRNAs as well as cis-acting RNA elements within mRNAs ) is much higher than previously thought; thus, the ability to computationally predict and analyze RNAs has taken on new importance. We have computationally studied the secondary structures in an alignment of six Aspergillus genomes. Little is known about the RNAs present in this set of fungi, and this diverse set of genomes has an optimal level of sequence conservation for observing the correlated evolution of base-pairs seen in RNAs. Methodology/Principal Findings We report the results of a whole-genome search for evolutionarily conserved secondary structures, as well as the results of clustering these predicted secondary structures by structural similarity. We find a total of 7450 predicted secondary structures, including a new predicted ∼60 bp long hairpin motif found primarily inside introns. We find no evidence for microRNAs. Different types of genomic regions are over-represented in different classes of predicted secondary structures. Exons contain the longest motifs (primarily long, branched hairpins), 5′ UTRs primarily contain groupings of short hairpins located near the start codon, and 3′ UTRs contain very little secondary structure compared to other regions. There is a large concentration of short hairpins just inside the boundaries of exons. The density of predicted intronic RNAs increases with the length of introns, and the density of predicted secondary structures within mRNA coding regions increases with the number of introns in a gene. Conclusions/Sigificance There are many conserved, high-confidence RNAs of unknown function in these Aspergillus genomes, as well as interesting spatial distributions of predicted secondary structures. This study increases our knowledge of secondary structure in these aspergillus organisms. PMID:18665251

  4. PlanTE-MIR DB: a database for transposable element-related microRNAs in plant genomes.

    PubMed

    R Lorenzetti, Alan P; A de Antonio, Gabriel Y; Paschoal, Alexandre R; Domingues, Douglas S

    2016-05-01

    Transposable elements (TEs) comprise a major fraction of many plant genomes and are known to drive their organization and evolution. Several studies show that these repetitive elements have a prominent role in shaping noncoding regions of the genome such as microRNA (miRNA) loci, which are components of post-transcriptional regulation mechanisms. Although some studies have reported initial formation of miRNA loci from TE sequences, especially in model plants, the approaches that were used did not employ systems that would allow results to be delivered by a user-friendly database. In this study, we identified 152 precursor miRNAs overlapping TEs in 10 plant species. PlanTE-MIR DB was designed to assemble this data and deliver it to the scientific community interested in miRNA origin, evolution, and regulation pathways. Users can browse the database through a web interface and search for entries using various parameters. This resource is cross-referenced with repetitive element (Repbase Update) and miRNA (miRBase) repositories, where sequences can be checked for further analysis. All data in PlanTE-MIR DB are publicly available for download in several file formats to facilitate their understanding and use. The database is hosted at http://bioinfo-tool.cp.utfpr.edu.br/plantemirdb/ .

  5. VitisExpDB: A Database Resource for Grape Functional Genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for Vitis vinifera and non-vinifera grape varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation details and gene ontology based...

  6. The two genome sequence release and blast server construction for aflatoxin-producing L and S strains Aspergillus parasiticus and A. flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic and carcinogenic secondary metabolites. These compounds, produced by Aspergillus flavus and A. parasiticus, contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed...

  7. [The first case of persistent vaginitis due to Aspergillus protuberus in an immunocompetent patient].

    PubMed

    Borsa, Barış Ata; Özgün, Gonca; Houbraken, Jos; Ökmen, Fırat

    2015-01-01

    ITS regions were amplified and sequenced from isolated DNA for genomic characterization. The obtained sequences were compared with the NCBI database and internal databases of the CBS-KNAW Fungal Biodiversity Centre and confirmed as Aspergillus section Versicolores. As a result of recent changes in classification of fungi, analysis of partial β-tubulin and calmodulin sequences have also been used to obtain a detailed and precise characterization. Eventually, the strain has been identified as A.protuberus which is a recently accepted species distinct from Aspergillus section Versicolores. As the patient could not be contacted after the preliminary report, detailed demographical information, probable origin and route of transmission of the agent and prognosis of infection remained obscure. In conclusion, the first case of vaginitis caused by A.protuberus was described in this report with the support of clinical, pathological, microbiological and molecular data.

  8. The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation.

    PubMed

    McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick

    2007-01-01

    The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development. PMID:17145713

  9. The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation

    PubMed Central

    McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K.; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J.; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D.; Rodionov, Dmitry A.; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick

    2007-01-01

    The National Microbial Pathogen Data Resource (NMPDR) () is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of ∼50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development. PMID:17145713

  10. On the way toward systems biology of Aspergillus fumigatus infection.

    PubMed

    Albrecht, Daniela; Kniemeyer, Olaf; Mech, Franziska; Gunzer, Matthias; Brakhage, Axel; Guthke, Reinhard

    2011-06-01

    Pathogenicity of Aspergillus fumigatus is multifactorial. Thus, global studies are essential for the understanding of the infection process. Therefore, a data warehouse was established where genome sequence, transcriptome and proteome data are stored. These data are analyzed for the elucidation of virulence determinants. The data analysis workflow starts with pre-processing including imputing of missing values and normalization. Last step is the identification of differentially expressed genes/proteins as interesting candidates for further analysis, in particular for functional categorization and correlation studies. Sequence data and other prior knowledge extracted from databases are integrated to support the inference of gene regulatory networks associated with pathogenicity. This knowledge-assisted data analysis aims at establishing mathematical models with predictive strength to assist further experimental work. Recently, first steps were done to extend the integrative data analysis and computational modeling by evaluating spatio-temporal data (movies) that monitor interactions of A. fumigatus morphotypes (e.g. conidia) with host immune cells.

  11. Genome-wide development of transposable elements-based markers in foxtail millet and construction of an integrated database.

    PubMed

    Yadav, Chandra Bhan; Bonthala, Venkata Suresh; Muthamilarasan, Mehanathan; Pandey, Garima; Khan, Yusuf; Prasad, Manoj

    2015-02-01

    Transposable elements (TEs) are major components of plant genome and are reported to play significant roles in functional genome diversity and phenotypic variations. Several TEs are highly polymorphic for insert location in the genome and this facilitates development of TE-based markers for various genotyping purposes. Considering this, a genome-wide analysis was performed in the model plant foxtail millet. A total of 30,706 TEs were identified and classified as DNA transposons (24,386), full-length Copia type (1,038), partial or solo Copia type (10,118), full-length Gypsy type (1,570), partial or solo Gypsy type (23,293) and Long- and Short-Interspersed Nuclear Elements (3,659 and 53, respectively). Further, 20,278 TE-based markers were developed, namely Retrotransposon-Based Insertion Polymorphisms (4,801, ∼24%), Inter-Retrotransposon Amplified Polymorphisms (3,239, ∼16%), Repeat Junction Markers (4,451, ∼22%), Repeat Junction-Junction Markers (329, ∼2%), Insertion-Site-Based Polymorphisms (7,401, ∼36%) and Retrotransposon-Microsatellite Amplified Polymorphisms (57, 0.2%). A total of 134 Repeat Junction Markers were screened in 96 accessions of Setaria italica and 3 wild Setaria accessions of which 30 showed polymorphism. Moreover, an open access database for these developed resources was constructed (Foxtail millet Transposable Elements-based Marker Database; http://59.163.192.83/ltrdb/index.html). Taken together, this study would serve as a valuable resource for large-scale genotyping applications in foxtail millet and related grass species.

  12. A Genome-Wide Survey of the Microsatellite Content of the Globe Artichoke Genome and the Development of a Web-Based Database

    PubMed Central

    Portis, Ezio; Portis, Flavio; Valente, Luisa; Moglia, Andrea; Barchi, Lorenzo; Lanteri, Sergio; Acquadro, Alberto

    2016-01-01

    The recently acquired genome sequence of globe artichoke (Cynara cardunculus var. scolymus) has been used to catalog the genome’s content of simple sequence repeat (SSR) markers. More than 177,000 perfect SSRs were revealed, equivalent to an overall density across the genome of 244.5 SSRs/Mbp, but some 224,000 imperfect SSRs were also identified. About 21% of these SSRs were complex (two stretches of repeats separated by <100 nt). Some 73% of the SSRs were composed of dinucleotide motifs. The SSRs were categorized for the numbers of repeats present, their overall length and were allocated to their linkage group. A total of 4,761 perfect and 6,583 imperfect SSRs were present in 3,781 genes (14.11% of the total), corresponding to an overall density across the gene space of 32,5 and 44,9 SSRs/Mbp for perfect and imperfect motifs, respectively. A putative function has been assigned, using the gene ontology approach, to the set of genes harboring at least one SSR. The same search parameters were applied to reveal the SSR content of 14 other plant species for which genome sequence is available. Certain species-specific SSR motifs were identified, along with a hexa-nucleotide motif shared only with the other two Compositae species (sunflower (Helianthus annuus) and horseweed (Conyza canadensis)) included in the study. Finally, a database, called “Cynara cardunculus MicroSatellite DataBase” (CyMSatDB) was developed to provide a searchable interface to the SSR data. CyMSatDB facilitates the retrieval of SSR markers, as well as suggested forward and reverse primers, on the basis of genomic location, genomic vs genic context, perfect vs imperfect repeat, motif type, motif sequence and repeat number. The SSR markers were validated via an in silico based PCR analysis adopting two available assembled transcriptomes, derived from contrasting globe artichoke accessions, as templates. PMID:27648830

  13. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    PubMed

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks.

  14. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    PubMed

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. PMID:27008877

  15. Pristionchus.org: a genome-centric database of the nematode satellite species Pristionchus pacificus.

    PubMed

    Dieterich, Christoph; Roeseler, Waltraud; Sobetzko, Patrick; Sommer, Ralf J

    2007-01-01

    Comparative studies have been of invaluable importance to the understanding of evolutionary biology. The evolution of developmental programs can be studied in nematodes at a single cell resolution given their fixed cell lineage. We have established Pristionchus pacificus as a major satellite organism for evolutionary developmental biology relative to Caenorhabditis elegans, the model nematode. Online genomic information to support studies in this satellite system can be accessed at http://www.pristionchus.org. Our web resource offers diverse content covering genome browsing, genetic and physical maps, similarity searches, a community platform and assembly details. Content will be continuously improved as we annotate the P.pacificus genome, and will be an indispensable resource for P.pacificus genomics.

  16. The Littorina sequence database (LSD)--an online resource for genomic data.

    PubMed

    Canbäck, Björn; André, Carl; Galindo, Juan; Johannesson, Kerstin; Johansson, Tomas; Panova, Marina; Tunlid, Anders; Butlin, Roger

    2012-01-01

    We present an interactive, searchable expressed sequence tag database for the periwinkle snail Littorina saxatilis, an upcoming model species in evolutionary biology. The database is the result of a hybrid assembly between Sanger and 454 sequences, 1290 and 147,491 sequences respectively. Normalized and non-normalized cDNA was obtained from different ecotypes of L. saxatilis collected in the UK and Sweden. The Littorina sequence database (LSD) contains 26,537 different contigs, of which 2453 showed similarity with annotated proteins in UniProt. Querying the LSD permits the selection of the taxonomic origin of blast hits for each contig, and the search can be restricted to particular taxonomic groups. The database allows access to UniProt annotations, blast output, protein family domains (PFAM) and Gene Ontology. The database will allow users to search for genetic markers and identifying candidate genes or genes for expression analyses. It is open for additional deposition of sequence information for L. saxatilis and other species of the genus Littorina. The LSD is available at http://mbio-serv2.mbioekol.lu.se/Littorina/. PMID:21707958

  17. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  18. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells

    PubMed Central

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W. H.; Pimanda, John E.

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF–gene and protein–protein networks and to evaluate the association of genes with cellular processes and tissue expression. PMID:24185696

  19. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    PubMed

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  20. FGF: a web tool for Fishing Gene Family in a whole genome database.

    PubMed

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong; Li, Yuan; Vang, Søren; Fan, Wei; Wang, Junyi; Zhang, Zhang; Wang, Wen; Kristiansen, Karsten; Wang, Jun

    2007-07-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF is freely available on a web server at http://fgf.genomics.org.cn/

  1. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database.

    PubMed

    Hayman, G Thomas; Laulederkind, Stanley J F; Smith, Jennifer R; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu.

  2. The Disease Portals, disease–gene annotation and the RGD disease ontology at the Rat Genome Database

    PubMed Central

    Hayman, G. Thomas; Laulederkind, Stanley J. F.; Smith, Jennifer R.; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD; http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene–disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL: http://rgd.mcw.edu PMID:27009807

  3. Design and implementation of a twin-family database for behavior genetics and genomics studies.

    PubMed

    Boomsma, Dorret I; Willemsen, Gonneke; Vink, Jacqueline M; Bartels, Meike; Groot, Paul; Hottenga, Jouke Jan; van Beijsterveldt, C E M Toos; Stroet, Therese; van Dijk, Rob; Wertheim, Rien; Visser, Marco; van der Kleij, Frank

    2008-06-01

    In this article we describe the design and implementation of a database for extended twin families. The database does not focus on probands or on index twins, as this approach becomes problematic when larger multigenerational families are included, when more than one set of multiples is present within a family, or when families turn out to be part of a larger pedigree. Instead, we present an alternative approach that uses a highly flexible notion of persons and relations. The relations among the subjects in the database have a one-to-many structure, are user-definable and extendible and support arbitrarily complicated pedigrees. Some additional characteristics of the database are highlighted, such as the storage of historical data, predefined expressions for advanced queries, output facilities for individuals and relations among individuals and an easy-to-use multi-step wizard for contacting participants. This solution presents a flexible approach to accommodate pedigrees of arbitrary size, multiple biological and nonbiological relationships among participants and dynamic changes in these relations that occur over time, which can be implemented for any type of multigenerational family study. PMID:18498212

  4. Design and implementation of a twin-family database for behavior genetics and genomics studies.

    PubMed

    Boomsma, Dorret I; Willemsen, Gonneke; Vink, Jacqueline M; Bartels, Meike; Groot, Paul; Hottenga, Jouke Jan; van Beijsterveldt, C E M Toos; Stroet, Therese; van Dijk, Rob; Wertheim, Rien; Visser, Marco; van der Kleij, Frank

    2008-06-01

    In this article we describe the design and implementation of a database for extended twin families. The database does not focus on probands or on index twins, as this approach becomes problematic when larger multigenerational families are included, when more than one set of multiples is present within a family, or when families turn out to be part of a larger pedigree. Instead, we present an alternative approach that uses a highly flexible notion of persons and relations. The relations among the subjects in the database have a one-to-many structure, are user-definable and extendible and support arbitrarily complicated pedigrees. Some additional characteristics of the database are highlighted, such as the storage of historical data, predefined expressions for advanced queries, output facilities for individuals and relations among individuals and an easy-to-use multi-step wizard for contacting participants. This solution presents a flexible approach to accommodate pedigrees of arbitrary size, multiple biological and nonbiological relationships among participants and dynamic changes in these relations that occur over time, which can be implemented for any type of multigenerational family study.

  5. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    SciTech Connect

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islands in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.

  6. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    DOE PAGESBeta

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islandsmore » in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.« less

  7. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    PubMed Central

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  8. Aspergillus spinal epidural abscess

    SciTech Connect

    Byrd, B.F. III; Weiner, M.H.; McGee, Z.A.

    1982-12-17

    A spinal epidural abscess developed in a renal transplant recipient; results of a serum radioimmunoassay for Aspergillus antigen were positive. Laminectomy disclosed an abscess of the L4-5 interspace and L-5 vertebral body that contained hyphal forms and from which Aspergillus species was cultured. Serum Aspergillus antigen radioimmunoassay may be a valuable, specific early diagnostic test when systemic aspergillosis is a consideration in an immunosuppressed host.

  9. Under-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element.

    PubMed

    Koga, Akihiko

    2012-02-01

    It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illustration of this phenomenon. An example using a currently used database is expected to contribute to the intuitive understanding of how serious the under-representation is. The present study provides the first quantitative example (in the case of 16 copies of virtually identical, 4.7-kb sequences in a genome of 7 × 10 (8) bp) by comparing the results of BLAST searches of a sequence database (contig N50; 9.8 kb) with those of Southern blot analysis of genomic DNA. This has revealed that the internal regions of the repetitive sequences are under-represented to a striking extent.

  10. The CATH extended protein-family database: providing structural annotations for genome sequences.

    PubMed

    Pearl, Frances M G; Lee, David; Bray, James E; Buchan, Daniel W A; Shepherd, Adrian J; Orengo, Christine A

    2002-02-01

    An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.

  11. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes

    PubMed Central

    Roten, Claude-Alain H.; Gamba, Patrick; Barblan, Jean-Luc; Karamata, Dimitri

    2002-01-01

    The ever increasing rate at which whole genome sequences are becoming accessible to the scientific community has created an urgent need for tools enabling comparison of chromosomes of different species. We have applied biometric methods to available chromosome sequences and posted the results on our Comparative Genometrics (CG) web site. By genometrics, a term coined by Elston and Wilson [Genet. Epidemiol. (1990), 7, 17–19], we understand a biometric analysis of chromosomes. During the initial phase, our web site displays, for all completely sequenced prokaryotic genomes, three genometric analyses: the DNA walk [Lobry (1999) Microbiology Today, 26, 164–165] and two complementary representations, i.e. the cumulative GC- and TA-skew analyses, capable of identifying, at the level of whole genomes, features inherent to chromosome organization and functioning. It appears that the latter features are taxon-specific. Although primarily focused on prokaryotic chromosomes, the CG web site contains genometric information on paradigm plasmids, phages, viruses and eukaryotic organelles. Relevant data and methods can be readily used by the scientific community for further analyses as well as for tutorial purposes. Our data posted at the CG web site are freely available on the World Wide Web at http://www.unil.ch/comparativegenometrics. PMID:11752276

  12. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  13. LC-MS/MS-based proteome profiling in Daphnia pulex and Daphnia longicephala: the Daphnia pulex genome database as a key for high throughput proteomics in Daphnia

    PubMed Central

    Fröhlich, Thomas; Arnold, Georg J; Fritsch, Rainer; Mayr, Tobias; Laforsch, Christian

    2009-01-01

    Background Daphniids, commonly known as waterfleas, serve as important model systems for ecology, evolution and the environmental sciences. The sequencing and annotation of the Daphnia pulex genome both open future avenues of research on this model organism. As proteomics is not only essential to our understanding of cell function, and is also a powerful validation tool for predicted genes in genome annotation projects, a first proteomic dataset is presented in this article. Results A comprehensive set of 701,274 peptide tandem-mass-spectra, derived from Daphnia pulex, was generated, which lead to the identification of 531 proteins. To measure the impact of the Daphnia pulex filtered models database for mass spectrometry based Daphnia protein identification, this result was compared with results obtained with the Swiss-Prot and the Drosophila melanogaster database. To further validate the utility of the Daphnia pulex database for research on other Daphnia species, additional 407,778 peptide tandem-mass-spectra, obtained from Daphnia longicephala, were generated and evaluated, leading to the identification of 317 proteins. Conclusion Peptides identified in our approach provide the first experimental evidence for the translation of a broad variety of predicted coding regions within the Daphnia genome. Furthermore it could be demonstrated that identification of Daphnia longicephala proteins using the Daphnia pulex protein database is feasible but shows a slightly reduced identification rate. Data provided in this article clearly demonstrates that the Daphnia genome database is the key for mass spectrometry based high throughput proteomics in Daphnia. PMID:19383153

  14. Aspergillus fumigatus and Aspergillosis

    PubMed Central

    Latgé, Jean-Paul

    1999-01-01

    Aspergillus fumigatus is one of the most ubiquitous of the airborne saprophytic fungi. Humans and animals constantly inhale numerous conidia of this fungus. The conidia are normally eliminated in the immunocompetent host by innate immune mechanisms, and aspergilloma and allergic bronchopulmonary aspergillosis, uncommon clinical syndromes, are the only infections observed in such hosts. Thus, A. fumigatus was considered for years to be a weak pathogen. With increases in the number of immunosuppressed patients, however, there has been a dramatic increase in severe and usually fatal invasive aspergillosis, now the most common mold infection worldwide. In this review, the focus is on the biology of A. fumigatus and the diseases it causes. Included are discussions of (i) genomic and molecular characterization of the organism, (ii) clinical and laboratory methods available for the diagnosis of aspergillosis in immunocompetent and immunocompromised hosts, (iii) identification of host and fungal factors that play a role in the establishment of the fungus in vivo, and (iv) problems associated with antifungal therapy. PMID:10194462

  15. GELBANK : A database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes.

    SciTech Connect

    Babnigg, G.; Giometti, C. S.; Biosciences Division

    2004-01-01

    GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel patterns of proteomes from organisms with known genome information (available at and ftp://bioinformatics.anl.gov/gelbank/). Currently it includes 131 completed, mostly microbial proteomes available from the National Center for Biotechnology Information. A web interface allows the upload of 2D gel patterns and their annotation for registered users. The images are organized by species, tissue type, separation method, sample type and staining method. The database can be queried based on protein or 2DE-pattern attributes. A web interface allows registered users to assign molecular weight and pH gradient profiles to their own 2D gel patterns as well as to link protein identifications to a given spot on the pattern. The website presents all of the submitted 2D gel patterns where the end-user can dynamically display the images or parts of images along with molecular weight, pH profile information and linked protein identification. A collection of images can be selected for the creation of animations from which the user can select sub-regions of interest and unlimited 2D gel patterns for visualization. The website currently presents 233 identifications for 81 gel patterns for Homo sapiens, Methanococcus jannaschii, Pyro coccus furiosus, Shewanella oneidensis, Escherichia coli and Deinococcus radiodurans.

  16. MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline

    PubMed Central

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P.; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-01-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518

  17. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline.

    PubMed

    Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

    2013-11-01

    Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.

  18. A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database

    PubMed Central

    2014-01-01

    Background Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. Results We present EcoCyc–18.0–GEM, a genome-scale model of the E. coli K–12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc–18.0–GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc–18.0–GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model’s derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc–18.0–GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. Conclusion Significant

  19. DLGP: A database for lineage-conserved and lineage-specific gene pairs in animal and plant genomes.

    PubMed

    Wang, Dapeng

    2016-01-15

    The conservation of gene organization in the genome with lineage-specificity is an invaluable resource to decipher their potential functionality with diverse selective constraints, especially in higher animals and plants. Gene pairs appear to be the minimal structure for such kind of gene clusters that tend to reside in their preferred locations, representing the distinctive genomic characteristics in single species or a given lineage. Despite gene families having been investigated in a widespread manner, the definition of gene pair families in various taxa still lacks adequate attention. To address this issue, we report DLGP (http://lcgbase.big.ac.cn/DLGP/) that stores the pre-calculated lineage-based gene pairs in currently available 134 animal and plant genomes and inspect them under the same analytical framework, bringing out a set of innovational features. First, the taxonomy or lineage has been classified into four levels such as Kingdom, Phylum, Class and Order. It adopts all-to-all comparison strategy to identify the possible conserved gene pairs in all species for each gene pair in certain species and reckon those that are conserved in over a significant proportion of species in a given lineage (e.g. Primates, Diptera or Poales) as the lineage-conserved gene pairs. Furthermore, it predicts the lineage-specific gene pairs by retaining the above-mentioned lineage-conserved gene pairs that are not conserved in any other lineages. Second, it carries out pairwise comparison for the gene pairs between two compared species and creates the table including all the conserved gene pairs and the image elucidating the conservation degree of gene pairs in chromosomal level. Third, it supplies gene order browser to extend gene pairs to gene clusters, allowing users to view the evolution dynamics in the gene context in an intuitive manner. This database will be able to facilitate the particular comparison between animals and plants, between vertebrates and arthropods, and

  20. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa

    PubMed Central

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa. PMID:22123747

  1. Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9

    PubMed Central

    Park, Jeongbin; Kim, Jin-Soo; Bae, Sangsu

    2016-01-01

    Motivation: CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult. Results: We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users. Availability and implementation: Free access at http://www.rgenome.net/cas-database/. Contact: sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153724

  2. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa.

    PubMed

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa.

  3. Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants.

    PubMed

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A; Muller, Robert; Rhee, Seung Yon

    2010-08-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org).

  4. Human Mitochondrial Protein Database

    National Institute of Standards and Technology Data Gateway

    SRD 131 Human Mitochondrial Protein Database (Web, free access)   The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.

  5. Aspergillus as a multi-purpose cell factory: current status and perspectives.

    PubMed

    Meyer, Vera; Wu, Bo; Ram, Arthur F J

    2011-03-01

    Aspergilli have a long history in biotechnology as expression platforms for the production of food ingredients, pharmaceuticals and enzymes. The achievements made during the last years, however, have the potential to revolutionize Aspergillus biotechnology and to assure Aspergillus a dominant place among microbial cell factories. This mini-review will highlight most recent breakthroughs in fundamental and applied Aspergillus research with a focus on new molecular tools, techniques and products. New trends and concepts related to Aspergillus genomics and systems biology will be discussed as well as the challenges that have to be met to integrate omics data with metabolic engineering attempts.

  6. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    SciTech Connect

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  7. Development of novel simple sequence repeat markers from a genomic sequence survey database and their application for diversity assessment in Jatropha curcas germplasm from Guatemala.

    PubMed

    Raposo, R S; Souza, I G B; Veloso, M E C; Kobayashi, A K; Laviola, B G; Diniz, F M

    2014-08-07

    The last few years have seen a significant increase in the number of large-scale sequencing projects generating whole genome databases. These sequence databases can be surveyed (genome sequence survey) for tandem repeats as an alternative means to develop microsatellites for monitoring and selecting natural populations and cultivars of Jatropha curcas. A total of 100 tandem repeats were revealed from mining 368 genomic surveyed sequences available in the Kazusa DNA Research Institute database. Twenty microsatellite sequences were successfully amplified, resulting in repeatable and scorable polymerase chain reaction products. Genotyping of J. curcas accessions from the Guatemalan population revealed 18 polymorphic loci. The average number of alleles per locus was 6.9, and allelic sizes ranged from 94 to 299 bp. Expected and observed heterozygosities ranged from 0.118 to 0.906 and from 0.082 to 0.794, respectively. Polymorphic information content values ranged from 0.114 (JcSSR-34) to 0.886 (JcSSR-33) with an average of 0.627. Analysis with Micro-Checker indicated few null alleles for locus JcSSR-37 in Guatemalan populations, which may be a possible cause of its deviation from Hardy-Weinberg equilibrium, even after Bonferroni's correction. No loci showed significant linkage disequilibrium. These microsatellite loci are expected to be valuable molecular markers in J. curcas because they show high levels of polymorphism and heterozygosity.

  8. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries.

    PubMed

    Rodrigues, N B; Loverde, P T; Romanha, A J; Oliveira, G

    2002-01-01

    In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3%) sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds). Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8%) contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds). The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds). From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  9. New species of Aspergillus producing sterigmatocystin.

    PubMed Central

    Rabie, C J; Steyn, M; van Schalkwyk, G C

    1977-01-01

    A number of species belonging to the genus Aspergillus were evaluated for their toxicity to ducklings and the ability to produce sterigmatocystin. Three new species capable of producing sterigmatocystin were found, namely, Aspergillus aurantio-brunneus, Aspergillus quadrilineatus, and Aspergillus ustus. All three were toxic to ducklings. The production of sterigmatocystin by Aspergillus rugulosus was confirmed, and the toxicity of Aspergillus stellatus and Aspergillus multicolor is described. PMID:406838

  10. Development in Aspergillus

    PubMed Central

    Krijgsheld, P.; Bleichrodt, R.; van Veluw, G.J.; Wang, F.; Müller, W.H.; Dijksterhuis, J.; Wösten, H.A.B.

    2013-01-01

    The genus Aspergillus represents a diverse group of fungi that are among the most abundant fungi in the world. Germination of a spore can lead to a vegetative mycelium that colonizes a substrate. The hyphae within the mycelium are highly heterogeneous with respect to gene expression, growth, and secretion. Aspergilli can reproduce both asexually and sexually. To this end, conidiophores and ascocarps are produced that form conidia and ascospores, respectively. This review describes the molecular mechanisms underlying growth and development of Aspergillus. PMID:23450714

  11. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements.

    PubMed

    Luo, Hao; Lin, Yan; Gao, Feng; Zhang, Chun-Ting; Zhang, Ren

    2014-01-01

    The combination of high-density transposon-mediated mutagenesis and high-throughput sequencing has led to significant advancements in research on essential genes, resulting in a dramatic increase in the number of identified prokaryotic essential genes under diverse conditions and a revised essential-gene concept that includes all essential genomic elements, rather than focusing on protein-coding genes only. DEG 10, a new release of the Database of Essential Genes (available at http://www.essentialgene.org), has been developed to accommodate these quantitative and qualitative advancements. In addition to increasing the number of bacterial and archaeal essential genes determined by genome-wide gene essentiality screens, DEG 10 also harbors essential noncoding RNAs, promoters, regulatory sequences and replication origins. These essential genomic elements are determined not only in vitro, but also in vivo, under diverse conditions including those for survival, pathogenesis and antibiotic resistance. We have developed customizable BLAST tools that allow users to perform species- and experiment-specific BLAST searches for a single gene, a list of genes, annotated or unannotated genomes. Therefore, DEG 10 includes essential genomic elements under different conditions in three domains of life, with customizable BLAST tools.

  12. [Aspergillus insulicola Sp. Nov].

    PubMed

    de Montemayor, L; Santiago, A R

    1975-04-30

    A strain of Aspergillus sp. is described and proposed as a new species under the name "Aspergillus insulicola sp. nov." Montemayor & Santiago, 1973. This strain was isolated from soil samples taken in "Aves Island" during a scientific expedition.--Aves Island, situated at 15 degrees, 40 feet, 42 inches N and 63 degrees, 36 feet, 47 inches W, about 665 Km of the coast of Venezuela, has very special ecological conditions. Due to its smallness: 550 m long and 40 to 120 m across and to its low profile only 3 m over sea level, it is swept by the sea during the periodical storms and hurricanes in the area. It has thus a very interesting fauna and flora. We took a series of soil samples to study its mycological flora. Forty samples were inoculated by dilution method. In this first paper a species is described and proposed as a new species because of its macroscopic and microscopic characteristics, as well as by its biological properties, under the name "Aspergillus insulicola sp. nov.". In its study we have tried to follow as closely as possible the methods recommended by Kennet B. Raper & Dorothy Fenell, world authorities on the genera Aspergillus and Penicillium. The strain is being kept in USB under the number T1, and has been sent to ATCC & CBSC to be incorporated in their collections.

  13. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of

  14. DOR – a Database of Olfactory Receptors – Integrated Repository for Sequence and Secondary Structural Information of Olfactory Receptors in Selected Eukaryotic Genomes

    PubMed Central

    Nagarathnam, Balasubramanian; Karpe, Snehal D; Harini, Krishnan; Sankar, Kannan; Iftekhar, Mohammed; Rajesh, Durairaj; Giji, Sadasivam; Archunan, Govidaraju; Balakrishnan, Veluchamy; Gromiha, M Michael; Nemoto, Wataru; Fukui, Kazhuhiko; Sowdhamini, Ramanathan

    2014-01-01

    Olfaction is the response to odors and is mediated by a class of membrane-bound proteins called olfactory receptors (ORs). An understanding of these receptors serves as a good model for basic signal transduction mechanisms and also provides important clues for the strategies adopted by organisms for their ultimate survival using chemosensory perception in search of food or defense against predators. Prior research on cross-genome phylogenetic analyses from our group motivated the addressal of conserved evolutionary trends, clustering, and ortholog prediction of ORs. The database of olfactory receptors (DOR) is a repository that provides sequence and structural information on ORs of selected organisms (such as Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens). Users can download OR sequences, study predicted membrane topology, and obtain cross-genome sequence alignments and phylogeny, including three-dimensional (3D) structural models of 100 selected ORs and their predicted dimer interfaces. The database can be accessed from http://caps.ncbs.res.in/DOR. Such a database should be helpful in designing experiments on point mutations to probe into the possible dimerization modes of ORs and to even understand the evolutionary changes between different receptors. PMID:25002814

  15. A novel non-thermostable deuterolysin from Aspergillus oryzae.

    PubMed

    Maeda, Hiroshi; Katase, Toru; Sakai, Daisuke; Takeuchi, Michio; Kusumoto, Ken-Ichi; Amano, Hitoshi; Ishida, Hiroki; Abe, Keietsu; Yamagata, Youhei

    2016-09-01

    Three putative deuterolysin (EC 3.4.24.29) genes (deuA, deuB, and deuC) were found in the Aspergillus oryzae genome database ( http://www.bio.nite.go.jp/dogan/project/view/AO ). One of these genes, deuA, was corresponding to NpII gene, previously reported. DeuA and DeuB were overexpressed by recombinant A. oryzae and were purified. The degradation profiles against protein substrates of both enzymes were similar, but DeuB showed wider substrate specificity against peptidyl MCA-substrates compared with DeuA. Enzymatic profiles of DeuB except for thermostability also resembled those of DeuA. DeuB was inactivated by heat treatment above 80° C, different from thermostable DeuA. Transcription analysis in wild type A. oryzae showed only deuB was expressed in liquid culture, and the addition of the proteinous substrate upregulated the transcription. Furthermore, the NaNO3 addition seems to eliminate the effect of proteinous substrate for the transcription of deuB. PMID:27050120

  16. Genomics and Health Impact Update

    MedlinePlus

    ... Genomics in Practice Newborn Screening Pharmacogenomics Reproductive Health Tools and Databases About the Genomics & Health Impact Update The Office of Public Health Genomics provides updated and credible ...

  17. The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

    PubMed Central

    Ernst, R.; Mayer, K. F. X.

    2004-01-01

    The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project. PMID:18629059

  18. New taxa in Aspergillus section Usti

    PubMed Central

    Samson, R.A.; Varga, J.; Meijer, M.; Frisvad, J.C.

    2011-01-01

    Based on phylogenetic analysis of sequence data, Aspergillus section Usti includes 21 species, inclucing two teleomorphic species Aspergillus heterothallicus (= Emericella heterothallica) and Fennellia monodii. Aspergillus germanicus sp. nov. was isolated from indoor air in Germany. This species has identical ITS sequences with A. insuetus CBS 119.27, but is clearly distinct from that species based on β-tubulin and calmodulin sequence data. This species is unable to grow at 37 °C, similarly to A. keveii and A. insuetus. Aspergillus carlsbadensis sp. nov. was isolated from the Carlsbad Caverns National Park in New Mexico. This taxon is related to, but distinct from a clade including A. calidoustus, A. pseudodeflectus, A. insuetus and A. keveii on all trees. This species is also unable to grow at 37 °C, and acid production was not observed on CREA. Aspergillus californicus sp. nov. is proposed for an isolate from chamise chaparral (Adenostoma fasciculatum) in California. It is related to a clade including A. subsessilis and A. kassunensis on all trees. This species grew well at 37 °C, and acid production was not observed on CREA. The strain CBS 504.65 from soil in Turkey showed to be clearly distinct from the A. deflectus ex-type strain, indicating that this isolate represents a distinct species in this section. We propose the name A. turkensis sp. nov. for this taxon. This species grew, although rather restrictedly at 37 °C, and acid production was not observed on CREA. Isolates from stored maize, South Africa, as a culture contaminant of Bipolaris sorokiniana from indoor air in Finland proved to be related to, but different from A. ustus and A. puniceus. The taxon is proposed as the new species A. pseudoustus. Although supported only by low bootstrap values, F. monodii was found to belong to section Usti based on phylogenetic analysis of either loci BLAST searches to the GenBank database also resulted in closest hits from section Usti. This species obviously

  19. New taxa in Aspergillus section Usti.

    PubMed

    Samson, R A; Varga, J; Meijer, M; Frisvad, J C

    2011-06-30

    Based on phylogenetic analysis of sequence data, Aspergillus section Usti includes 21 species, inclucing two teleomorphic species Aspergillus heterothallicus (= Emericella heterothallica) and Fennellia monodii. Aspergillus germanicus sp. nov. was isolated from indoor air in Germany. This species has identical ITS sequences with A. insuetusCBS 119.27, but is clearly distinct from that species based on β-tubulin and calmodulin sequence data. This species is unable to grow at 37 °C, similarly to A. keveii and A. insuetus. Aspergillus carlsbadensis sp. nov. was isolated from the Carlsbad Caverns National Park in New Mexico. This taxon is related to, but distinct from a clade including A. calidoustus, A. pseudodeflectus, A. insuetus and A. keveii on all trees. This species is also unable to grow at 37 °C, and acid production was not observed on CREA. Aspergillus californicus sp. nov. is proposed for an isolate from chamise chaparral (Adenostoma fasciculatum) in California. It is related to a clade including A. subsessilis and A. kassunensis on all trees. This species grew well at 37 °C, and acid production was not observed on CREA. The strain CBS 504.65 from soil in Turkey showed to be clearly distinct from the A. deflectus ex-type strain, indicating that this isolate represents a distinct species in this section. We propose the name A. turkensis sp. nov. for this taxon. This species grew, although rather restrictedly at 37 °C, and acid production was not observed on CREA. Isolates from stored maize, South Africa, as a culture contaminant of Bipolaris sorokiniana from indoor air in Finland proved to be related to, but different from A. ustus and A. puniceus. The taxon is proposed as the new species A. pseudoustus. Although supported only by low bootstrap values, F. monodii was found to belong to section Usti based on phylogenetic analysis of either loci BLAST searches to the GenBank database also resulted in closest hits from section Usti. This species obviously

  20. Identification of alternative splice variants in Aspergillus flavus through comparison of multiple tandem MS search algorithms

    PubMed Central

    2011-01-01

    Background Database searching is the most frequently used approach for automated peptide assignment and protein inference of tandem mass spectra. The results, however, depend on the sequences in target databases and on search algorithms. Recently by using an alternative splicing database, we identified more proteins than with the annotated proteins in Aspergillus flavus. In this study, we aimed at finding a greater number of eligible splice variants based on newly available transcript sequences and the latest genome annotation. The improved database was then used to compare four search algorithms: Mascot, OMSSA, X! Tandem, and InsPecT. Results The updated alternative splicing database predicted 15833 putative protein variants, 61% more than the previous results. There was transcript evidence for 50% of the updated genes compared to the previous 35% coverage. Database searches were conducted using the same set of spectral data, search parameters, and protein database but with different algorithms. The false discovery rates of the peptide-spectrum matches were estimated < 2%. The numbers of the total identified proteins varied from 765 to 867 between algorithms. Whereas 42% (1651/3891) of peptide assignments were unanimous, the comparison showed that 51% (568/1114) of the RefSeq proteins and 15% (11/72) of the putative splice variants were inferred by all algorithms. 12 plausible isoforms were discovered by focusing on the consensus peptides which were detected by at least three different algorithms. The analysis found different conserved domains in two putative isoforms of UDP-galactose 4-epimerase. Conclusions We were able to detect dozens of new peptides using the improved alternative splicing database with the recently updated annotation of the A. flavus genome. Unlike the identifications of the peptides and the RefSeq proteins, large variations existed between the putative splice variants identified by different algorithms. 12 candidates of putative isoforms

  1. Enhancing a Pathway-Genome Database (PGDB) to Capture Subcellular Localization of Metabolites and Enzymes: The Nucleotide-Sugar Biosynthetic Pathways of Populus trichocarpa

    SciTech Connect

    Nag, A.; Karpinets, T. V.; Chang, C. H.; Bar-Peled, M.

    2012-01-01

    Understanding how cellular metabolism works and is regulated requires that the underlying biochemical pathways be adequately represented and integrated with large metabolomic data sets to establish a robust network model. Genetically engineering energy crops to be less recalcitrant to saccharification requires detailed knowledge of plant polysaccharide structures and a thorough understanding of the metabolic pathways involved in forming and regulating cell-wall synthesis. Nucleotide-sugars are building blocks for synthesis of cell wall polysaccharides. The biosynthesis of nucleotide-sugars is catalyzed by a multitude of enzymes that reside in different subcellular organelles, and precise representation of these pathways requires accurate capture of this biological compartmentalization. The lack of simple localization cues in genomic sequence data and annotations however leads to missing compartmentalization information for eukaryotes in automatically generated databases, such as the Pathway-Genome Databases (PGDBs) of the SRI Pathway Tools software that drives much biochemical knowledge representation on the internet. In this report, we provide an informal mechanism using the existing Pathway Tools framework to integrate protein and metabolite sub-cellular localization data with the existing representation of the nucleotide-sugar metabolic pathways in a prototype PGDB for Populus trichocarpa. The enhanced pathway representations have been successfully used to map SNP abundance data to individual nucleotide-sugar biosynthetic genes in the PGDB. The manually curated pathway representations are more conducive to the construction of a computational platform that will allow the simulation of natural and engineered nucleotide-sugar precursor fluxes into specific recalcitrant polysaccharide(s).

  2. Modern taxonomy of biotechnologically important Aspergillus and Penicillium species.

    PubMed

    Houbraken, Jos; de Vries, Ronald P; Samson, Robert A

    2014-01-01

    Taxonomy is a dynamic discipline and name changes of fungi with biotechnological, industrial, or medical importance are often difficult to understand for researchers in the applied field. Species belonging to the genera Aspergillus and Penicillium are commonly used or isolated, and inadequate taxonomy or uncertain nomenclature of these genera can therefore lead to tremendous confusion. Misidentification of strains used in biotechnology can be traced back to (1) recent changes in nomenclature, (2) new taxonomic insights, including description of new species, and/or (3) incorrect identifications. Changes in the recent published International Code of Nomenclature for Algae, Fungi and Plants will lead to numerous name changes of existing Aspergillus and Penicillium species and an overview of the current names of biotechnological important species is given. Furthermore, in (biotechnological) literature old and invalid names are still used, such as Aspergillus awamori, A. foetidus, A. kawachii, Talaromyces emersonii, Acremonium cellulolyticus, and Penicillium funiculosum. An overview of these and other species with their correct names is presented. Furthermore, the biotechnologically important species Talaromyces thermophilus is here combined in Thermomyces as Th. dupontii. The importance of Aspergillus, Penicillium, and related genera is also illustrated by the high number of undertaken genome sequencing projects. A number of these strains are incorrectly identified or atypical strains are selected for these projects. Recommendations for correct strain selection are given here. Phylogenetic analysis shows a close relationship between the genome-sequenced strains of Aspergillus, Penicillium, and Monascus. Talaromyces stipitatus and T. marneffei (syn. Penicillium marneffei) are closely related to Thermomyces lanuginosus and Th. dupontii (syn. Talaromyces thermophilus), and these species appear to be distantly related to Aspergillus and Penicillium. In the last part of

  3. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach.

    PubMed

    Yang, Xue-Dong; Tan, Hua-Wei; Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools. PMID:27148975

  4. SpinachDB: A Well-Characterized Genomic Database for Gene Family Classification and SNP Information of Spinach

    PubMed Central

    Zhu, Wei-Min

    2016-01-01

    Spinach (Spinacia oleracea L.), which originated in central and western Asia, belongs to the family Amaranthaceae. Spinach is one of most important leafy vegetables with a high nutritional value as well as being a perfect research material for plant sex chromosome models. As the completion of genome assembly and gene prediction of spinach, we developed SpinachDB (http://222.73.98.124/spinachdb) to store, annotate, mine and analyze genomics and genetics datasets efficiently. In this study, all of 21702 spinach genes were annotated. A total of 15741 spinach genes were catalogued into 4351 families, including identification of a substantial number of transcription factors. To construct a high-density genetic map, a total of 131592 SSRs and 1125743 potential SNPs located in 548801 loci of spinach genome were identified in 11 cultivated and wild spinach cultivars. The expression profiles were also performed with RNA-seq data using the FPKM method, which could be used to compare the genes. Paralogs in spinach and the orthologous genes in Arabidopsis, grape, sugar beet and rice were identified for comparative genome analysis. Finally, the SpinachDB website contains seven main sections, including the homepage; the GBrowse map that integrates genome, genes, SSR and SNP marker information; the Blast alignment service; the gene family classification search tool; the orthologous and paralogous gene pairs search tool; and the download and useful contact information. SpinachDB will be continually expanded to include newly generated robust genomics and genetics data sets along with the associated data mining and analysis tools. PMID:27148975

  5. What's My Substrate? Computational Function Assignment of Candida parapsilosis ADH5 by Genome Database Search, Virtual Screening, and QM/MM Calculations.

    PubMed

    Dhoke, Gaurao V; Ensari, Yunus; Davari, Mehdi D; Ruff, Anna Joëlle; Schwaneberg, Ulrich; Bocola, Marco

    2016-07-25

    Zinc-dependent medium chain reductase from Candida parapsilosis can be used in the reduction of carbonyl compounds to pharmacologically important chiral secondary alcohols. To date, the nomenclature of cpADH5 is differing (CPCR2/RCR/SADH) in the literature, and its natural substrate is not known. In this study, we utilized a substrate docking based virtual screening method combined with KEGG, MetaCyc pathway, and Candida genome databases search for the discovery of natural substrates of cpADH5. The virtual screening of 7834 carbonyl compounds from the ZINC database provided 94 aldehydes or methyl/ethyl ketones as putative carbonyl substrates. Out of which, 52 carbonyl substrates of cpADH5 with catalytically active docking pose were identified by employing mechanism based substrate docking protocol. Comparison of the virtual screening results with KEGG, MetaCyc database search, and Candida genome pathway analysis suggest that cpADH5 might be involved in the Ehrlich pathway (reduction of fusel aldehydes in leucine, isoleucine, and valine degradation). Our QM/MM calculations and experimental activity measurements affirmed that butyraldehyde substrates are the potential natural substrates of cpADH5, suggesting a carbonyl reductase role for this enzyme in butyraldehyde reduction in aliphatic amino acid degradation pathways. Phylogenetic tree analysis of known ADHs from Candida albicans shows that cpADH5 is close to caADH5. We therefore propose, according to the experimental substrate identification and sequence similarity, the common name butyraldehyde dehydrogenase cpADH5 for Candida parapsilosis CPCR2/RCR/SADH. PMID:27387009

  6. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L.).

    PubMed

    Han, Bin; Wang, Changbiao; Tang, Zhaohui; Ren, Yongkang; Li, Yali; Zhang, Dayong; Dong, Yanhui; Zhao, Xinghua

    2015-01-01

    Microsatellites or simple sequence repeats (SSRs) are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW) genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR); 70,564 (23.9%) were found to be monomorphic and 224,703 (76.1%) were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3%) amplified one locus, 8 (17.8%) amplified multiple identical loci, and 13 (28.9%) did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising source to

  7. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata.

    PubMed

    Inglis, Diane O; Arnaud, Martha B; Binkley, Jonathan; Shah, Prachi; Skrzypek, Marek S; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2012-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

  8. A new single-nucleotide polymorphisms database for rainbow trout generated through whole genome resequencing of selected samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  9. Interstrain variability in the virulence of Aspergillus fumigatus and Aspergillus terreus in a Toll-deficient Drosophila fly model of invasive aspergillosis.

    PubMed

    Ben-Ami, Ronen; Lamaris, Gregory A; Lewis, Russell E; Kontoyiannis, Dimitrios P

    2010-03-01

    Members of the genus Aspergillus are opportunistic fungal pathogens characterized by their genomic diversity. However, whether variations among Aspergillus strains and species at the genome level translate into significant differences in virulence is unclear. Therefore, we studied the interstrain and interspecies variations in virulence for a collection of Aspergillus fumigatus and Aspergillus terreus isolates using a previously described model of invasive aspergillosis in Toll-deficient fruit flies. We then looked for associations between survival in the fly model and strain relatedness as defined by repetitive-sequence polymerase chain reaction (rep-PCR). We observed no significant differences in the survival of flies infected with A. fumigatus vs. A. terreus or flies infected with colonizing vs. invasive isolates of either species. However, in both Aspergillus species we observed significant interstrain variability in fly survival (P<0.001 by the log-rank test). Using rep-PCR, we identified two dominant A. fumigatus clades that were associated with significantly different survival rates in Toll-deficient flies (P=0.007). We conclude that the fly model of invasive aspergillosis enables high-throughput screening of Aspergillus species for variations in virulence and may uncover distinct A. fumigatus clades that differ in their pathogenicity.

  10. Metabolomics of Aspergillus fumigatus.

    PubMed

    Frisvad, Jens C; Rank, Christian; Nielsen, Kristian F; Larsen, Thomas O

    2009-01-01

    Aspergillus fumigatus is the most important species in Aspergillus causing infective lung diseases. This species has been reported to produce a large number of extrolites, including secondary metabolites, acids, and proteins such as hydrophobins and extracellular enzymes. At least 226 potentially bioactive secondary metabolites have been reported from A. fumigatus that can be ordered into 24 biosynthetic families. Of these families we have detected representatives from the following families of secondary metabolites: fumigatins, fumigaclavines, fumiquinazolines, trypacidin and monomethylsulochrin, fumagillins, gliotoxins, pseurotins, chloroanthraquinones, fumitremorgins, verruculogen, helvolic acids, and pyripyropenes by HPLC with diode array detection and mass spectrometric detection. There is still doubt whether A. fumigatus can produce tryptoquivalins, but all isolates produce the related fumiquinazolines. We also tentatively detected sphingofungins in A. fumigatus Af293 and in an isolate of A. lentulus. The sphingofungins may have a similar role as the toxic fumonisins, found in A. niger. A further number of mycotoxins, including ochratoxin A, and other secondary metabolites have been reported from A. fumigatus, but in those cases either the fungus or its metabolite appear to be misidentified. PMID:18763205

  11. Previously unknown species of Aspergillus.

    PubMed

    Gautier, M; Normand, A-C; Ranque, S

    2016-08-01

    The use of multi-locus DNA sequence analysis has led to the description of previously unknown 'cryptic' Aspergillus species, whereas classical morphology-based identification of Aspergillus remains limited to the section or species-complex level. The current literature highlights two main features concerning these 'cryptic' Aspergillus species. First, the prevalence of such species in clinical samples is relatively high compared with emergent filamentous fungal taxa such as Mucorales, Scedosporium or Fusarium. Second, it is clearly important to identify these species in the clinical laboratory because of the high frequency of antifungal drug-resistant isolates of such Aspergillus species. Matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) has recently been shown to enable the identification of filamentous fungi with an accuracy similar to that of DNA sequence-based methods. As MALDI-TOF MS is well suited to the routine clinical laboratory workflow, it facilitates the identification of these 'cryptic' Aspergillus species at the routine mycology bench. The rapid establishment of enhanced filamentous fungi identification facilities will lead to a better understanding of the epidemiology and clinical importance of these emerging Aspergillus species. Based on routine MALDI-TOF MS-based identification results, we provide original insights into the key interpretation issues of a positive Aspergillus culture from a clinical sample. Which ubiquitous species that are frequently isolated from air samples are rarely involved in human invasive disease? Can both the species and the type of biological sample indicate Aspergillus carriage, colonization or infection in a patient? Highly accurate routine filamentous fungi identification is central to enhance the understanding of these previously unknown Aspergillus species, with a vital impact on further improved patient care. PMID:27263029

  12. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments

    PubMed Central

    Petryszak, Robert; Burdett, Tony; Fiorelli, Benedetto; Fonseca, Nuno A.; Gonzalez-Porta, Mar; Hastings, Emma; Huber, Wolfgang; Jupp, Simon; Keays, Maria; Kryvych, Nataliya; McMurry, Julie; Marioni, John C.; Malone, James; Megy, Karine; Rustici, Gabriella; Tang, Amy Y.; Taubert, Jan; Williams, Eleanor; Mannion, Oliver; Parkinson, Helen E.; Brazma, Alvis

    2014-01-01

    Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user. PMID:24304889

  13. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution.

    PubMed

    Stenson, Peter D; Ball, Edward V; Mort, Matthew; Phillips, Andrew D; Shaw, Katy; Cooper, David N

    2012-09-01

    The Human Gene Mutation Database (HGMD) constitutes a comprehensive core collection of data on germ-line mutations in nuclear genes underlying or associated with human inherited disease (http://www.hgmd.org). Data cataloged include single-base-pair substitutions in coding, regulatory, and splicing-relevant regions, micro-deletions and micro-insertions, indels, and triplet repeat expansions, as well as gross gene deletions, insertions, duplications, and complex rearrangements. Each mutation is entered into HGMD only once, in order to avoid confusion between recurrent and identical-by-descent lesions. By March 2012, the database contained in excess of 123,600 different lesions (HGMD Professional release 2012.1) detected in 4,514 different nuclear genes, with new entries currently accumulating at a rate in excess of 10,000 per annum. ∼6,000 of these entries constitute disease-associated and functional polymorphisms. HGMD also includes cDNA reference sequences for more than 98% of the listed genes.

  14. Evaluation of 16SpathDB 2.0, an automated 16S rRNA gene sequence database, using 689 complete bacterial genomes.

    PubMed

    Teng, Jade L L; Ho, Tom C C; Yeung, Ronald S Y; Wong, Annette Y P; Wang, Haiyin; Chen, Chen; Fung, Kitty S C; Lau, Susanna K P; Woo, Patrick C Y

    2014-02-01

    Interpretation of 16S rRNA sequences is a difficult problem faced by clinical microbiologists and technicians. In this study, we evaluated the updated 16SpathDB 2.0 database, using 689 16S rRNA sequences from 689 complete genomes of medically important bacteria. Among these 689 16S rRNA sequences, none was wrongly identified, with 35.8% reported as a single bacterial species having >98% identity with the query sequence (category 1), 63.9% reported as more than 1 bacterial species having >98% identity with the query sequence (category 2), 0.3% reported to the genus level (category 3), and none reported as no match (category 4). For the 16S rRNA sequences of non-duplicated bacterial species reported as category 1 or 2, the percentage of bacterial species reported as category 1 was significantly higher for anaerobic Gram-positive/Gram-negative bacteria than aerobic/facultative anaerobic Gram-positive/Gram-negative bacteria. 16SpathDB 2.0 is a user-friendly and accurate database for 16S rRNA sequence interpretation in clinical laboratories.

  15. Aspergillus prosthetic valve endocarditis.

    PubMed Central

    Petheram, I S; Seal, R M

    1976-01-01

    The clinical, laboratory, and histopathological features of seven cases of Aspergillus fumigatus prosthetic valve endocarditis are presented. The exact nature of the lesion, a combination of infective fungal endocarditis and thrombosis on the prosthetic valve, is discussed and the difficulties in clinical diagnosis are emphasized. Helpful indications were sudden unexplained heart failure with the appearance of new murmurs, and emboli to large or medium-sized systemic arteries. Fever and anaemia were inconstant, and in no case was blood culture or precipitin investigation helpful. Spore contamination of operating theatre air was the likely source of infection, and measures taken to overcome this and other predisposing factors are discussed. Since medical diagnosis is usually late and the few reported cures in this condition have included replacement of the prosthesis, early surgical intervention combined with antifungal chemotherapy is advised. Images PMID:788218

  16. Tremorgenic Mycotoxins from Aspergillus Caespitosus

    PubMed Central

    Schroeder, H. W.; Cole, R. J.; Hein, H.; Kirksey, J. W.

    1975-01-01

    Two tremorgenic mycotoxins were isolated from Aspergillus caespitosus, and identified as verruculogen and fumitremorgin B. They were produced at the rate of 172 and 325 mg per kg, respectively, on autoclaved cracked field corn. PMID:1155935

  17. Tremorgenic mycotoxins from Aspergillus caespitosus.

    PubMed

    Schroeder, H W; Cole, R J; Hein, H; Kirksey, J W

    1975-06-01

    Two tremorgenic mycotoxins were isolated from Aspergillus caespitosus, and identified as verruculogen and fumitremorgin B. They were produced at the rate of 172 and 325 mg per kg, respectively, on autoclaved cracked field corn. PMID:1155935

  18. RegTransBase - A Database Of Regulatory Sequences and Interactionsin a Wide Range of Prokaryotic Genomes

    SciTech Connect

    Kazakov, Alexei E.; Cipriano, Michael J.; Novichkov, Pavel S.; Minovitsky, Simon; Vinogradov, Dmitry V.; Arkin, Adam; Mironov, AndreyA.; Gelfand, Mikhail S.; Dubchak, Inna

    2006-07-01

    RegTransBase, a manually curated database of regulatoryinteractions in prokaryotes, captures the knowledge in publishedscientific literature using a controlled vocabulary. Although a number ofdatabases describing interactions between regulatory proteins and theirbinding sites are currently being maintained, they focus mostly on themodel organisms Escherichia coli and Bacillus subtilis, or are entirelycomputationally derived. RegTransBase describes a large number ofregulatory interactions reported in many organisms and contains varioustypes of experimental data, in particular: the activation or repressionof transcription by an identified direct regulator; determining thetranscriptional regulatory function of a protein (or RNA) directlybinding to DNA (RNA); mapping or prediction of binding site for aregulatory protein; characterization of regulatory mutations. Currently,the RegTransBase content is derived from about 3000 relevant articlesdescribing over 7000 experiments in relation to 128 microbes. It containsdata on the regulation of about 7500 genes and evidence for 6500interactions with 650 regulators. RegTransBase also contains manuallycreated position weight matrices (PWM) that can be used to identifycandidate regulatory sites in over 60 species. RegTransBase is availableat http://regtransbase.lbl.gov.

  19. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies

    PubMed Central

    Li, Mulin Jun; Liu, Zipeng; Wang, Panwen; Wong, Maria P.; Nelson, Matthew R.; Kocher, Jean-Pierre A.; Yeager, Meredith; Sham, Pak Chung; Chanock, Stephen J.; Xia, Zhengyuan; Wang, Junwen

    2016-01-01

    Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0×10−3) from each original publication, and generated a total of 252 530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website. PMID:26615194

  20. Canadian Open Genetics Repository (COGR): a unified clinical genomics database as a community resource for standardising and sharing genetic interpretations

    PubMed Central

    Lerner-Ellis, Jordan; Wang, Marina; White, Shana; Lebo, Matthew S

    2015-01-01

    Background The Canadian Open Genetics Repository is a collaborative effort for the collection, storage, sharing and robust analysis of variants reported by medical diagnostics laboratories across Canada. As clinical laboratories adopt modern genomics technologies, the need for this type of collaborative framework is increasingly important. Methods A survey to assess existing protocols for variant classification and reporting was delivered to clinical genetics laboratories across Canada. Based on feedback from this survey, a variant assessment tool was made available to all laboratories. Each participating laboratory was provided with an instance of GeneInsight, a software featuring versioning and approval processes for variant assessments and interpretations and allowing for variant data to be shared between instances. Guidelines were established for sharing data among clinical laboratories and in the final outreach phase, data will be made readily available to patient advocacy groups for general use. Results The survey demonstrated the need for improved standardisation and data sharing across the country. A variant assessment template was made available to the community to aid with standardisation. Instances of the GeneInsight tool were provided to clinical diagnostic laboratories across Canada for the purpose of uploading, transferring, accessing and sharing variant data. Conclusions As an ongoing endeavour and a permanent resource, the Canadian Open Genetics Repository aims to serve as a focal point for the collaboration of Canadian laboratories with other countries in the development of tools that take full advantage of laboratory data in diagnosing, managing and treating genetic diseases. PMID:25904639

  1. Aspergillus mulundensis sp. nov., a new species for the fungus producing the antifungal echinocandin lipopeptides, mulundocandins.

    PubMed

    Bills, Gerald F; Yue, Qun; Chen, Li; Li, Yan; An, Zhiqiang; Frisvad, Jens C

    2016-03-01

    The invalidly published name Aspergillus sydowii var. mulundensis was proposed for a strain of Aspergillus that produced new echinocandin metabolites designated as the mulundocadins. Reinvestigation of this strain (Y-30462=DSMZ 5745) using phylogenetic, morphological, and metabolic data indicated that it is a distinct and novel species of Aspergillus sect. Nidulantes. The taxonomic novelty, Aspergillus mulundensis, is introduced for this historically important echinocandin-producing strain. The closely related A. nidulans FGSC A4 has one of the most extensively characterized secondary metabolomes of any filamentous fungus. Comparison of the full-genome sequences of DSMZ 5745 and FGSC A4 indicated that the two strains share 33 secondary metabolite biosynthetic gene clusters. These shared gene clusters represent ~45% of the total secondary metabolome of each strain, thus indicating a high level intraspecific divergence in terms of secondary metabolism.

  2. Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection

    PubMed Central

    Guo, Baozhu; Chen, Xiaoping; Dang, Phat; Scully, Brian T; Liang, Xuanqiang; Holbrook, C Corley; Yu, Jiujiang; Culbreath, Albert K

    2008-01-01

    Background Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression

  3. Databases for Microbiologists

    DOE PAGESBeta

    Zhulin, Igor B.

    2015-05-26

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.

  4. Databases for Microbiologists

    PubMed Central

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  5. TreeTFDB: An Integrative Database of the Transcription Factors from Six Economically Important Tree Crops for Functional Predictions and Comparative and Functional Genomics

    PubMed Central

    Mochida, Keiichi; Yoshida, Takuhiro; Sakurai, Tetsuya; Yamaguchi-Shinozaki, Kazuko; Shinozaki, Kazuo; Tran, Lam-Son Phan

    2013-01-01

    Crop plants, whose productivity is affected by a wide range of growing and environmental conditions, are grown for economic purposes. Transcription factors (TFs) play central role in regulation of many biological processes, including plant development and responses to environmental stimuli, by activating or repressing spatiotemporal gene expression. Here, we describe the TreeTFDB (http://treetfdb.bmep.riken.jp/index.pl) that houses the TF repertoires of six economically important tree crop species: Jatropha curcas, papaya, cassava, poplar, castor bean and grapevine. Among these, the TF repertoire of J. curcas has not been reported by any other TF databases. In addition to their basic information, such as sequence and domain features, domain alignments, gene ontology assignment and sequence comparison, information on available full-length cDNAs, identity and positions of all types of known cis-motifs found in the promoter regions, gene expression data are provided. With its newly designed and friendly interface and its unique features, TreeTFDB will enable research community to predict the functions and provide access to available genetic resources for performing comparative and functional genomics of the crop TFs, either individually or at whole family level, in a comprehensive and convenient manner. PMID:23284086

  6. Vaccination approaches against opportunistic fungal infections caused by Aspergillus fumigatus.

    PubMed

    Reichard, Utz; Herrmann, Sahra; Asif, Abdul R

    2014-01-01

    Although innate immunity primarily combats systemic infections of opportunistic fungi such as Aspergillus and Candida spp., acquired and protective immunoreactions were observed long ago in animal trials following sublethal systemic infections caused by viable fungi or after challenging animals with inactivated fungal cells. Based on these observations, fungal antigens should exist which mediate such protective immunoreactions and have in part already been identified. In this context, this review focuses primarily on the various approaches that have been used to identify protection-mediating Aspergillus-antigens and their rationale. Emphasis is placed on screening methods that have exploited genetic or proteomic approaches on the basis of the corresponding fungal genome projects. Thereby, a survey and description is given of the antigens so far known to be capable of inducing immune responses that protect animals against acquiring lethal systemic aspergillosis.

  7. The chemical identification and analysis of Aspergillus nidulans secondary metabolites

    PubMed Central

    Sanchez, James F.

    2013-01-01

    Filamentous fungi have long been recognized to be a rich source of secondary metabolites with potential medicinal applications. The recent genomic sequencing of several Aspergillus species has revealed that many secondary metabolite gene clusters are apparently silent under standard laboratory conditions. Several successful approaches have been utilized to upregulate these genes and unearth the corresponding natural products. A straightforward, reliable method to purify and characterize new metabolites therefore should be useful. Details are provided herein on the cultivation of Aspergillus nidulans and the LC/MS analysis of the metabolic profile. Following is an explanation of silica gel chromatography, HPLC, and preparative TLC. Finally, the NMR characterization of previously unknown A. nidulans metabolites is detailed. PMID:23065610

  8. Enhanced diversity and aflatoxigenicity in interspecific hybrids of Aspergillus flavus and Aspergillus parasiticus.

    PubMed

    Olarte, Rodrigo A; Worthington, Carolyn J; Horn, Bruce W; Moore, Geromy G; Singh, Rakhi; Monacell, James T; Dorner, Joe W; Stone, Eric A; Xie, De-Yu; Carbone, Ignazio

    2015-04-01

    Aspergillus flavus and A. parasiticus are the two most important aflatoxin-producing fungi responsible for the contamination of agricultural commodities worldwide. Both species are heterothallic and undergo sexual reproduction in laboratory crosses. Here we examine the possibility of interspecific matings between A. flavus and A. parasiticus. These species can be distinguished morphologically and genetically, as well as by their mycotoxin profiles. Aspergillus flavus produces both B aflatoxins and cyclopiazonic acid (CPA), B aflatoxins or CPA alone, or neither mycotoxin; Aspergillus parasiticus produces B and G aflatoxins or the aflatoxin precursor O-methylsterigmatocystin, but not CPA. Only four of forty-five attempted interspecific crosses between opposite mating types of A. flavus and A. parasiticus were fertile and produced viable ascospores. Single ascospore strains from each cross were shown to be recombinant hybrids using multilocus genotyping and array comparative genome hybridization. Conidia of parents and their hybrid progeny were haploid and predominantly monokaryons and dikaryons based on flow cytometry. Multilocus phylogenetic inference showed that experimental hybrid progeny were grouped with naturally occurring A. flavus L strain and A. parasiticus. Higher total aflatoxin concentrations in some F1 progeny strains compared to midpoint parent aflatoxin levels indicate synergism in aflatoxin production; moreover, three progeny strains synthesized G aflatoxins that were not produced by the parents, and there was evidence of allopolyploidization in one strain. These results suggest that hybridization is an important diversifying force resulting in the genesis of novel toxin profiles in these agriculturally important fungi.

  9. Identification by Molecular Methods and Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry and Antifungal Susceptibility Profiles of Clinically Significant Rare Aspergillus Species in a Referral Chest Hospital in Delhi, India.

    PubMed

    Masih, Aradhana; Singh, Pradeep K; Kathuria, Shallu; Agarwal, Kshitij; Meis, Jacques F; Chowdhary, Anuradha

    2016-09-01

    Aspergillus species cause a wide spectrum of clinical infections. Although Aspergillus fumigatus and Aspergillus flavus remain the most commonly isolated species in aspergillosis, in the last decade, rare and cryptic Aspergillus species have emerged in diverse clinical settings. The present study analyzed the distribution and in vitro antifungal susceptibility profiles of rare Aspergillus species in clinical samples from patients with suspected aspergillosis in 8 medical centers in India. Further, a matrix-assisted laser desorption ionization-time of flight mass spectrometry in-house database was developed to identify these clinically relevant Aspergillus species. β-Tubulin and calmodulin gene sequencing identified 45 rare Aspergillus isolates to the species level, except for a solitary isolate. They included 23 less common Aspergillus species belonging to 12 sections, mainly in Circumdati, Nidulantes, Flavi, Terrei, Versicolores, Aspergillus, and Nigri Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) identified only 8 (38%) of the 23 rare Aspergillus isolates to the species level. Following the creation of an in-house database with the remaining 14 species not available in the Bruker database, the MALDI-TOF MS identification rate increased to 95%. Overall, high MICs of ≥2 μg/ml were noted for amphotericin B in 29% of the rare Aspergillus species, followed by voriconazole in 20% and isavuconazole in 7%, whereas MICs of >0.5 μg/ml for posaconazole were observed in 15% of the isolates. Regarding the clinical diagnoses in 45 patients with positive rare Aspergillus species cultures, 19 (42%) were regarded to represent colonization. In the remaining 26 patients, rare Aspergillus species were the etiologic agent of invasive, chronic, and allergic bronchopulmonary aspergillosis, allergic fungal rhinosinusitis, keratitis, and mycetoma. PMID:27413188

  10. Keratitis caused by Aspergillus pseudotamarii.

    PubMed

    Baranyi, Nikolett; Kocsubé, Sándor; Szekeres, András; Raghavan, Anita; Narendran, Venkatapathy; Vágvölgyi, Csaba; Panneer Selvam, Kanesan; Babu Singh, Yendremban Randhir; Kredics, László; Varga, János; Manikandan, Palanisamy

    2013-04-12

    A male patient presented with complaints of redness, pain and defective vision in the left eye. The infiltrate healed completely after two weeks of topical natamycin administration. A polyphasic approach was used to identify the isolate as Aspergillus pseudotamarii, which produced aflatoxins in inducing medium.

  11. Aspergillus infections in cystic fibrosis.

    PubMed

    King, Jill; Brunel, Shan F; Warris, Adilia

    2016-07-01

    Patients with cystic fibrosis (CF) suffer from chronic lung infection and airway inflammation. Respiratory failure secondary to chronic or recurrent infection remains the commonest cause of death and accounts for over 90% of mortality. Bacteria as Staphylococcus aureus, Pseudomonas aeruginosa and Burkholderia cepacia complex have been regarded the main CF pathogens and their role in progressive lung decline has been studied extensively. Little attention has been paid to the role of Aspergillus spp. and other filamentous fungi in the pathogenesis of non-ABPA (allergic bronchopulmonary aspergillosis) respiratory disease in CF, despite their frequent recovery in respiratory samples. It has become more apparent however, that Aspergillus spp. may play an important role in chronic lung disease in CF. Research delineating the underlying mechanisms of Aspergillus persistence and infection in the CF lung and its link to lung deterioration is lacking. This review summarizes the Aspergillus disease phenotypes observed in CF, discusses the role of CFTR (cystic fibrosis transmembrane conductance regulator)-protein in innate immune responses and new treatment modalities. PMID:27177733

  12. Sexual recombination in Aspergillus tubingensis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus tubingensis from section Nigri (Black Aspergilli) is closely related to A. niger and is used extensively in the industrial production of enzymes and organic acids. We recently discovered sexual reproduction in A. tubingensis and in this study, demonstrate that the progeny are products o...

  13. 76 FR 16297 - Aspergillus flavus

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-23

    ... Findings In the Federal Register of March 3, 2010 (75 FR 9596) (FRL-8811-2), EPA issued a notice pursuant..., 2003 (68 FR 41541) (FRL-7311-6). Those health effects data were the basis for establishing the... exemptions for experimental use of Aspergillus flavus AF36 on pistachio (72 FR 28871, May 23, 2007)...

  14. Keratitis caused by Aspergillus pseudotamarii

    PubMed Central

    Baranyi, Nikolett; Kocsubé, Sándor; Szekeres, András; Raghavan, Anita; Narendran, Venkatapathy; Vágvölgyi, Csaba; Panneer Selvam, Kanesan; Babu Singh, Yendremban Randhir; Kredics, László; Varga, János; Manikandan, Palanisamy

    2013-01-01

    A male patient presented with complaints of redness, pain and defective vision in the left eye. The infiltrate healed completely after two weeks of topical natamycin administration. A polyphasic approach was used to identify the isolate as Aspergillus pseudotamarii, which produced aflatoxins in inducing medium. PMID:24432226

  15. Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This report is provided each year to our stakeholders in the maize genetic community. In this report, we describe the five-year plan for MaizeGDB reviewed in early 2008 by the USDA-ARS peer review process and which was developed with inputs from our Working Group and the Allerton 2007 Report (MNL 82...

  16. RNA-Seq-Based Transcriptome Analysis of Aflatoxigenic Aspergillus flavus in Response to Water Activity

    PubMed Central

    Zhang, Feng; Guo, Zhenni; Zhong, Hong; Wang, Sen; Yang, Weiqiang; Liu, Yongfeng; Wang, Shihua

    2014-01-01

    Aspergillus flavus is one of the most important producers of carcinogenic aflatoxins in crops, and the effect of water activity (aw) on growth and aflatoxin production of A. flavus has been previously studied. Here we found the strains under 0.93 aw exhibited decreased conidiation and aflatoxin biosynthesis compared to that under 0.99 aw. When RNA-Seq was used to delineate gene expression profile under different water activities, 23,320 non-redundant unigenes, with an average length of 1297 bp, were yielded. By database comparisons, 19,838 unigenes were matched well (e-value < 10−5) with known gene sequences, and another 6767 novel unigenes were obtained by comparison to the current genome annotation of A. flavus. Based on the RPKM equation, 5362 differentially expressed unigenes (with |log2Ratio| ≥ 1) were identified between 0.99 aw and 0.93 aw treatments, including 3156 up-regulated and 2206 down-regulated unigenes, suggesting that A. flavus underwent an extensive transcriptome response during water activity variation. Furthermore, we found that the expression of 16 aflatoxin producing-related genes decreased obviously when water activity decreased, and the expression of 11 development-related genes increased after 0.99 aw treatment. Our data corroborate a model where water activity affects aflatoxin biosynthesis through increasing the expression of aflatoxin producing-related genes and regulating development-related genes. PMID:25421810

  17. Functional analysis of alcS, a gene of the alc cluster in Aspergillus nidulans.

    PubMed

    Flipphi, Michel; Robellet, Xavier; Dequier, Emmanuel; Leschelle, Xavier; Felenbok, Béatrice; Vélot, Christian

    2006-04-01

    The ethanol utilization pathway (alc system) of Aspergillus nidulans requires two structural genes, alcA and aldA, which encode the two enzymes (alcohol dehydrogenase and aldehyde dehydrogenase, respectively) allowing conversion of ethanol into acetate via acetyldehyde, and a regulatory gene, alcR, encoding the pathway-specific autoregulated transcriptional activator. The alcR and alcA genes are clustered with three other genes that are also positively regulated by alcR, although they are dispensable for growth on ethanol. In this study, we characterized alcS, the most abundantly transcribed of these three genes. alcS is strictly co-regulated with alcA, and encodes a 262-amino acid protein. Sequence comparison with protein databases detected a putative conserved domain that is characteristic of the novel GPR1/FUN34/YaaH membrane protein family. It was shown that the AlcS protein is located in the plasma membrane. Deletion or overexpression of alcS did not result in any obvious phenotype. In particular, AlcS does not appear to be essential for the transport of ethanol, acetaldehyde or acetate. Basic Local Alignment Search Tool analysis against the A. nidulans genome led to the identification of two novel ethanol- and ethylacetate-induced genes encoding other members of the GPR1/FUN34/YaaH family, AN5226 and AN8390.

  18. Aspergillus asper sp.nov. and Aspergillus collinsii sp.nov., from Aspergillus section Usti

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In sampling fungi from the built environment, two isolates that could not confidently be placed in described species were encountered. Phenotypic analysis suggested that they belonged in Aspergillus sect. Usti. In order to verify the sectional placement and to assure that they were undescribed rathe...

  19. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    PubMed Central

    Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991

  20. Biodegradation of phenol by Antarctic strains of Aspergillus fumigatus.

    PubMed

    Gerginova, Maria; Manasiev, Jordan; Yemendzhiev, Husein; Terziyska, Anna; Peneva, Nadejda; Alexieva, Zlatka

    2013-01-01

    Taxonomic identification of three newly isolated Antarctic fungal strains by their 18S rDNA sequences revealed their affiliation with Aspergillus fumigatus. Phenol (0.5 g/l) as the sole carbon source was completely degraded by all strains within less than two weeks. Intracellular activities of three key enzymes involved in the phenol catabolism were determined. Activities of phenol hydroxylase (EC 1.14.13.7), hydroquinone hydroxylase (EC 1.14.13.x), and catechol 1,2-dioxygenase (EC 1.13.11.1) varied significantly between strains. The rates of phenol degradation in the three strains correlated best with the activity of catechol 1,2-dioxygenase. Six pairs of oligonucleotide primers were designed on the basis of the Aspergillus fumigatus Af293 genome sequence (NCBI Acc. No. XM_743491.1) and used to amplify phenol hydroxylase-related gene sequences. DNA sequences of about 1200 bp were amplified from all three strains and found to have a high degree of sequence identity with the corresponding gene of Aspergillus fumigatus Af293.

  1. Overexpression of Aspergillus tubingensis faeA in protease-deficient Aspergillus niger enables ferulic acid production from plant material.

    PubMed

    Zwane, Eunice N; Rose, Shaunita H; van Zyl, Willem H; Rumbold, Karl; Viljoen-Bloom, Marinda

    2014-06-01

    The production of ferulic acid esterase involved in the release of ferulic acid side groups from xylan was investigated in strains of Aspergillus tubingensis, Aspergillus carneus, Aspergillus niger and Rhizopus oryzae. The highest activity on triticale bran as sole carbon source was observed with the A. tubingensis T8.4 strain, which produced a type A ferulic acid esterase active against methyl p-coumarate, methyl ferulate and methyl sinapate. The activity of the A. tubingensis ferulic acid esterase (AtFAEA) was inhibited twofold by glucose and induced twofold in the presence of maize bran. An initial accumulation of endoglucanase was followed by the production of endoxylanase, suggesting a combined action with ferulic acid esterase on maize bran. A genomic copy of the A. tubingensis faeA gene was cloned and expressed in A. niger D15#26 under the control of the A. niger gpd promoter. The recombinant strain has reduced protease activity and does not acidify the media, therefore promoting high-level expression of recombinant enzymes. It produced 13.5 U/ml FAEA after 5 days on autoclaved maize bran as sole carbon source, which was threefold higher than for the A. tubingensis donor strain. The recombinant AtFAEA was able to extract 50 % of the available ferulic acid from non-pretreated maize bran, making this enzyme suitable for the biological production of ferulic acid from lignocellulosic plant material. PMID:24664515

  2. Overexpression of Aspergillus tubingensis faeA in protease-deficient Aspergillus niger enables ferulic acid production from plant material.

    PubMed

    Zwane, Eunice N; Rose, Shaunita H; van Zyl, Willem H; Rumbold, Karl; Viljoen-Bloom, Marinda

    2014-06-01

    The production of ferulic acid esterase involved in the release of ferulic acid side groups from xylan was investigated in strains of Aspergillus tubingensis, Aspergillus carneus, Aspergillus niger and Rhizopus oryzae. The highest activity on triticale bran as sole carbon source was observed with the A. tubingensis T8.4 strain, which produced a type A ferulic acid esterase active against methyl p-coumarate, methyl ferulate and methyl sinapate. The activity of the A. tubingensis ferulic acid esterase (AtFAEA) was inhibited twofold by glucose and induced twofold in the presence of maize bran. An initial accumulation of endoglucanase was followed by the production of endoxylanase, suggesting a combined action with ferulic acid esterase on maize bran. A genomic copy of the A. tubingensis faeA gene was cloned and expressed in A. niger D15#26 under the control of the A. niger gpd promoter. The recombinant strain has reduced protease activity and does not acidify the media, therefore promoting high-level expression of recombinant enzymes. It produced 13.5 U/ml FAEA after 5 days on autoclaved maize bran as sole carbon source, which was threefold higher than for the A. tubingensis donor strain. The recombinant AtFAEA was able to extract 50 % of the available ferulic acid from non-pretreated maize bran, making this enzyme suitable for the biological production of ferulic acid from lignocellulosic plant material.

  3. Developmental regulators in Aspergillus fumigatus.

    PubMed

    Park, Hee-Soo; Yu, Jae-Hyuk

    2016-03-01

    The filamentous fungus Aspergillus fumigatus is the most prevalent airborne fungal pathogen causing severe and usually fatal invasive aspergillosis in immunocompromised patients. This fungus produces a large number of small hydrophobic asexual spores called conidia as the primary means of reproduction, cell survival, propagation, and infectivity. The initiation, progression, and completion of asexual development (conidiation) is controlled by various regulators that govern expression of thousands of genes associated with formation of the asexual developmental structure conidiophore, and biogenesis of conidia. In this review, we summarize key regulators that directly or indirectly govern conidiation in this important pathogenic fungus. Better understanding these developmental regulators may provide insights into the improvement in controlling both beneficial and detrimental aspects of various Aspergillus species.

  4. Developmental regulators in Aspergillus fumigatus.

    PubMed

    Park, Hee-Soo; Yu, Jae-Hyuk

    2016-03-01

    The filamentous fungus Aspergillus fumigatus is the most prevalent airborne fungal pathogen causing severe and usually fatal invasive aspergillosis in immunocompromised patients. This fungus produces a large number of small hydrophobic asexual spores called conidia as the primary means of reproduction, cell survival, propagation, and infectivity. The initiation, progression, and completion of asexual development (conidiation) is controlled by various regulators that govern expression of thousands of genes associated with formation of the asexual developmental structure conidiophore, and biogenesis of conidia. In this review, we summarize key regulators that directly or indirectly govern conidiation in this important pathogenic fungus. Better understanding these developmental regulators may provide insights into the improvement in controlling both beneficial and detrimental aspects of various Aspergillus species. PMID:26920882

  5. Atypical Aspergillus parasiticus isolates from pistachio with aflR gene nucleotide insertion identical to Aspergillus sojae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are the most toxic and carcinogenic secondary metabolites produced primarily by the filamentous fungi Aspergillus flavus and Aspergillus parasiticus. The toxins cause devastating economic losses because of strict regulations on distribution of contaminated products. Aspergillus sojae are...

  6. Gene expression profiling and identification of resistance genes to Aspergillus flavus infection in peanut through EST and microarray strategies.

    PubMed

    Guo, Baozhu; Fedorova, Natalie D; Chen, Xiaoping; Wan, Chun-Hua; Wang, Wei; Nierman, William C; Bhatnagar, Deepak; Yu, Jiujiang

    2011-07-01

    Aspergillus flavus and A. parasiticus infect peanut seeds and produce aflatoxins, which are associated with various diseases in domestic animals and humans throughout the world. The most cost-effective strategy to minimize aflatoxin contamination involves the development of peanut cultivars that are resistant to fungal infection and/or aflatoxin production. To identify peanut Aspergillus-interactive and peanut Aspergillus-resistance genes, we carried out a large scale peanut Expressed Sequence Tag (EST) project which we used to construct a peanut glass slide oligonucleotide microarray. The fabricated microarray represents over 40% of the protein coding genes in the peanut genome. For expression profiling, resistant and susceptible peanut cultivars were infected with a mixture of Aspergillusflavus and parasiticus spores. The subsequent microarray analysis identified 62 genes in resistant cultivars that were up-expressed in response to Aspergillus infection. In addition, we identified 22 putative Aspergillus-resistance genes that were constitutively up-expressed in the resistant cultivar in comparison to the susceptible cultivar. Some of these genes were homologous to peanut, corn, and soybean genes that were previously shown to confer resistance to fungal infection. This study is a first step towards a comprehensive genome-scale platform for developing Aspergillus-resistant peanut cultivars through targeted marker-assisted breeding and genetic engineering. PMID:22069737

  7. Onychomycosis caused by Aspergillus versicolor.

    PubMed

    Veraldi, Stefano; Chiaratti, Anna; Harak, Henry

    2010-07-01

    We report a case of onychomycosis caused by Aspergillus versicolor in a 66-year-old female patient. The infection was characterised clinically by yellowish pigmentation of the nail plate and mild nail bed hyperkeratosis of the first left toe. All other nails were normal. Three direct microscopical examinations of nail samples revealed the presence of hyaline hyphae as well as conidiophores. Pure colonies of A. versicolor were found in three cultures. The patient was successfully treated with oral itraconazole. PMID:19422523

  8. Two metabolites from Aspergillus flavipes.

    PubMed

    Clark, A M; Hufford, C D; Robertson, L W

    1977-01-01

    Two novel fungal metabolites, N-benzoyl-L-phenylalaninol (1a) and asperphenamate (2) were isolated from the culture filtrate and mycelium of Aspergillus flavipes ATCC 11013. N-benzoyl-L-phenylalaninol was identified by direct comparison with an authentic sample. The structure of asperphenamate is proposed as (S)-N-benzoyl-phenylalanine-(S)-2-benzamido-3-phenyl propyl ester, based on chemical and spectroscopic evidence. PMID:875642

  9. Generation, annotation, and analysis of an extensive Aspergillus niger EST collection

    PubMed Central

    Semova, Natalia; Storms, Reginald; John, Tricia; Gaudet, Pascale; Ulycznyj, Peter; Min, Xiang Jia; Sun, Jian; Butler, Greg; Tsang, Adrian

    2006-01-01

    Background Aspergillus niger, a saprophyte commonly found on decaying vegetation, is widely used and studied for industrial purposes. Despite its place as one of the most important organisms for commercial applications, the lack of available information about its genetic makeup limits research with this filamentous fungus. Results We present here the analysis of 12,820 expressed sequence tags (ESTs) generated from A. niger cultured under seven different growth conditions. These ESTs identify about 5,108 genes of which 44.5% code for proteins sharing similarity (E ≤ 1e -5) with GenBank entries of known function, 38% code for proteins that only share similarity with GenBank entries of unknown function and 17.5% encode proteins that do not have a GenBank homolog. Using the Gene Ontology hierarchy, we present a first classification of the A. niger proteins encoded by these genes and compare its protein repertoire with other well-studied fungal species. We have established a searchable web-based database that includes the EST and derived contig sequences and their annotation. Details about this project and access to the annotated A. niger database are available. Conclusion This EST collection and its annotation provide a significant resource for fundamental and applied research with A. niger. The gene set identified in this manuscript will be highly useful in the annotation of the genome sequence of A. niger, the genes described in the manuscript, especially those encoding hydrolytic enzymes will provide a valuable source for researchers interested in enzyme properties and applications. PMID:16457709

  10. Ecophysiological characterization of Aspergillus carbonarius, Aspergillus tubingensis and Aspergillus niger isolated from grapes in Spanish vineyards.

    PubMed

    García-Cela, E; Crespo-Sempere, A; Ramos, A J; Sanchis, V; Marin, S

    2014-03-01

    The aim of this study was to evaluate the diversity of black aspergilli isolated from berries from different agroclimatic regions of Spain. Growth characterization (in terms of temperature and water activity requirements) of Aspergillus carbonarius, Aspergillus tubingensis and Aspergillus niger was carried out on synthetic grape medium. A. tubingensis and A. niger showed higher maximum temperatures for growth (>45 °C versus 40-42 °C), and lower minimum aw requirements (0.83 aw versus 0.87 aw) than A. carbonarius. No differences in growth boundaries due to their geographical origin were found within A. niger aggregate isolates. Conversely, A. carbonarius isolates from the hotter and drier region grew and produced OTA at lower aw than other isolates. However, little genetic diversity in A. carbonarius was observed for the microsatellites tested and the same sequence of β-tubulin gene was observed; therefore intraspecific variability did not correlate with the geographical origin of the isolates or with their ability to produce OTA. Climatic change prediction points to drier and hotter climatic scenarios where A. tubingensis and A. niger could be even more prevalent over A. carbonarius, since they are better adapted to extreme high temperature and drier conditions.

  11. Fumonisin B2 production by Aspergillus niger.

    PubMed

    Frisvad, Jens C; Smedsgaard, Jørn; Samson, Robert A; Larsen, Thomas O; Thrane, Ulf

    2007-11-14

    The carcinogenic mycotoxin fumonisin B2 was detected for the first time in the industrially important Aspergillus niger. Fumonisin B2, known from Fusarium verticillioides and other Fusaria, was detected in cultures of three full genome sequenced strains of A. niger, in the ex type culture and in a culture of F. verticillioides by electrospray LC-MS analysis of methanolic extracts from agar plugs of cultures grown on several substrates. Whereas F. verticillioides produced fumonisins B1, B2, and B3 on agar media based on plant extracts, such as barley malt, oat, rice, potatoes, and carrots, A. niger produced fumonisin B2 best on agar media with a low water activity, including Czapek yeast autolysate agar with 5% NaCl. Of the media tested, only rice corn steep agar supported fumonisin production by both F. verticillioides and A. niger. However, A. niger had a different regulation of fumonisin production and a different quantitative profile of fumonisins, producing only B2 as compared to F. verticillioides. Fumonisin production by A. niger, which is a widely occurring species and an extremely important industrial organism, will have very important implications for biotechnology and especially food safety. A. niger is used for the production of citric acid and as producer of extracellular enzymes, and also as a transformation host for the expression of heterologous proteins. Certain strains of A. niger produce both ochratoxin A and fumonisins, so some foods and feeds may potentially contain two types of carcinogenic mycotoxins from this species.

  12. Genetics of Polyketide Metabolism in Aspergillus nidulans

    PubMed Central

    Klejnstrup, Marie L.; Frandsen, Rasmus J. N.; Holm, Dorte K.; Nielsen, Morten T.; Mortensen, Uffe H.; Larsen, Thomas O.; Nielsen, Jakob B.

    2012-01-01

    Secondary metabolites are small molecules that show large structural diversity and a broad range of bioactivities. Some metabolites are attractive as drugs or pigments while others act as harmful mycotoxins. Filamentous fungi have the capacity to produce a wide array of secondary metabolites including polyketides. The majority of genes required for production of these metabolites are mostly organized in gene clusters, which often are silent or barely expressed under laboratory conditions, making discovery and analysis difficult. Fortunately, the genome sequences of several filamentous fungi are publicly available, greatly facilitating the establishment of links between genes and metabolites. This review covers the attempts being made to trigger the activation of polyketide metabolism in the fungal model organism Aspergillus nidulans. Moreover, it will provide an overview of the pathways where ten polyketide synthase genes have been coupled to polyketide products. Therefore, the proposed biosynthesis of the following metabolites will be presented; naphthopyrone, sterigmatocystin, aspyridones, emericellamides, asperthecin, asperfuranone, monodictyphenone/emodin, orsellinic acid, and the austinols. PMID:24957370

  13. Identification of Aspergillus Brla Response Elements (Bres) by Genetic Selection in Yeast

    PubMed Central

    Chang, Y. C.; Timberlake, W. E.

    1993-01-01

    The brlA gene of Aspergillus nidulans plays a central role in controlling conidiophore development. To test the hypothesis that brlA encodes a transcriptional regulator and to identify sites of interaction for the BrlA polypeptide, we expressed brlA in Saccharomyces cerevisiae (yeast) strains containing Aspergillus DNA sequences inserted upstream of a minimal yeast promoter fused to the Escherichia coli lacZ gene. Initially, a DNA fragment from the promoter region of the developmentally regulated rodA gene was tested and shown to mediate brlA-dependent transcriptional activation. Two additional DNA fragments were selected from an Aspergillus genomic library by their ability to respond to brlA in yeast. These fragments contained multiple copies of a sequence motif present in the rodA fragment, which we propose to be sites for BrlA interaction and designate brlA response elements (BREs). DNA fragments containing BREs upstream of a minimal Aspergillus promoter were capable of conferring developmental regulation in Aspergillus. Deletion of BREs from the upstream region of rodA greatly decreased its developmental induction. Multiple copies of a synthetic oligonucleotide with the consensus sequence identified among the BREs mediated brlA-dependent transcriptional activation in yeast. The results show that a primary activity of brlA is transcriptional activation and tentatively identify sites of interaction for the BrlA polypeptide. PMID:8417986

  14. LAMP-PCR detection of ochratoxigenic Aspergillus species collected from peanut kernel.

    PubMed

    Al-Sheikh, H M

    2015-01-30

    Over the last decade, ochratoxin A (OTA) has been widely described and is ubiquitous in several agricultural products. Ochratoxins represent the second-most important mycotoxin group after aflatoxins. A total of 34 samples were surveyed from 3 locations, including Mecca, Madina, and Riyadh, Saudi Arabia, during 2012. Fungal contamination frequency was determined for surface-sterilized peanut seeds, which were seeded onto malt extract agar media. Aspergillus niger (35%), Aspergillus ochraceus (30%), and Aspergillus carbonarius (25%) were the most frequently observed Aspergillius species, while Aspergillus flavus and Aspergillus phoenicis isolates were only infrequently recovered and in small numbers (10%). OTA production was evaluated on yeast extract sucrose medium, which revealed that 57% of the isolates were A. niger and 60% of A. carbonarius isolates were OTA producers; 100% belonged to A. ochraceus. Only one isolate, morphologically identified as A. carbonarius, and 3 A. niger isolates unstably produced OTA. A polymerase chain reaction (PCR)-based identification and detection assay was used to identify A. ochraceus isolates. Using the primer sets OCRA1/OCRA2, 400-base pair PCR fragments were produced only when genomic DNA from A. ochraceus isolates was used. Recently, the loop-mediated isothermal amplification assay using recombinase polymerase amplification chemistry was used for A. carbonarius and A. niger DNA identification. As a non-gel-based technique, the amplification product was directly visualized in the reaction tube after adding calcein for naked-eye examination.

  15. Cyclopiazonic Acid Biosynthesis of Aspergillus flavus and Aspergillus oryzae

    PubMed Central

    Chang, Perng-Kuang; Ehrlich, Kenneth C.; Fujii, Isao

    2009-01-01

    Cyclopiazonic acid (CPA) is an indole-tetramic acid neurotoxin produced by some of the same strains of A. flavus that produce aflatoxins and by some Aspergillus oryzae strains. Despite its discovery 40 years ago, few reviews of its toxicity and biosynthesis have been reported. This review examines what is currently known about the toxicity of CPA to animals and humans, both by itself or in combination with other mycotoxins. The review also discusses CPA biosynthesis and the genetic diversity of CPA production in A. flavus/oryzae populations. PMID:22069533

  16. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data

    PubMed Central

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-01-01

    ABSTRACT Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects. PMID:27295129

  17. Pulmonary hyalinizing granuloma associated with Aspergillus infection.

    PubMed

    Pinckard, J Keith; Rosenbluth, Daniel B; Patel, Kishor; Dehner, Louis P; Pfeifer, John D

    2003-01-01

    A 38-year-old immunocompetent man with occupational exposure to Aspergillus presented with dyspnea, pleuritic chest pain, and hemoptysis. Chest roentgenograms and computed tomography scans demonstrated multiple pulmonary nodules bilaterally. An initial set of bronchial washing cultures grew Aspergillus fumigatus, serologic testing showed an elevated anti-Aspergillus titer, and immunodiffusion testing was positive for antibody against A. fumigatus and A. niger. There was no microbiologic or serologic evidence of infection by other pathogens, and no clinical or laboratory evidence of autoimmune disease. An open lung biopsy was diagnostic of pulmonary hyalinizing granuloma. This novel association with Aspergillus infection not only expands the spectrum of pathogens linked to pulmonary hyalinizing granuloma but also documents a new pattern of lung disease that can be caused by Aspergillus. PMID:12598920

  18. Development of RFLP-PCR method for the identification of medically important Aspergillus species using single restriction enzyme MwoI.

    PubMed

    Diba, K; Mirhendi, H; Kordbacheh, P; Rezaie, S

    2014-01-01

    In this study we attempted to modify the PCR-RFLP method using restriction enzyme MwoI for the identification of medically important Aspergillus species. Our subjects included nine standard Aspergillus species and 205 Aspergillus isolates of approved hospital acquired infections and hospital indoor sources. First of all, Aspergillus isolates were identified in the level of species by using morphologic method. A twenty four hours culture was performed for each isolates to harvest Aspergillus mycelia and then genomic DNA was extracted using Phenol-Chloroform method. PCR-RFLP using single restriction enzyme MwoI was performed in ITS regions of rDNA gene. The electrophoresis data were analyzed and compared with those of morphologic identifications. Total of 205 Aspergillus isolates included 153 (75%) environmental and 52 (25%) clinical isolates. A. flavus was the most frequently isolate in our study (55%), followed by A. niger 65(31.7%), A. fumigatus 18(8.7%), A. nidulans and A. parasiticus 2(1% each). MwoI enabled us to discriminate eight medically important Aspergillus species including A. fumigatus, A. niger, A. flavus as the most common isolated species. PCR-RFLP method using the restriction enzyme MwoI is a rapid and reliable test for identification of at least the most medically important Aspergillus species.

  19. Navigating public microarray databases.

    PubMed

    Penkett, Christopher J; Bähler, Jürg

    2004-01-01

    With the ever-escalating amount of data being produced by genome-wide microarray studies, it is of increasing importance that these data are captured in public databases so that researchers can use this information to complement and enhance their own studies. Many groups have set up databases of expression data, ranging from large repositories, which are designed to comprehensively capture all published data, through to more specialized databases. The public repositories, such as ArrayExpress at the European Bioinformatics Institute contain complete datasets in raw format in addition to processed data, whilst the specialist databases tend to provide downstream analysis of normalized data from more focused studies and data sources. Here we provide a guide to the use of these public microarray resources.

  20. Biological Databases for Human Research

    PubMed Central

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-01-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  1. Biological databases for human research.

    PubMed

    Zou, Dong; Ma, Lina; Yu, Jun; Zhang, Zhang

    2015-02-01

    The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation. PMID:25712261

  2. New taxa of Neosartorya and Aspergillus in Aspergillus section Fumigati.

    PubMed

    Hong, Seung-Beom; Shin, Hyeon-Dong; Hong, Joonbae; Frisvad, Jens C; Nielsen, Per V; Varga, János; Samson, Robert A

    2008-01-01

    Three new species of Neosartorya and one new Aspergillus of section Fumigati are proposed using a polyphasic approach based on morphology, extrolite production and partial beta-tubulin, calmodulin, and actin gene sequences. The phylogenetic analyses using the three genes clearly show that the taxa grouped separately from the known species and confirmed the phenotypic differences. Neosartorya denticulata is characterized by its unique denticulate ascospores with a prominent equatorial furrow; N. assulata by well developed flaps on the convex surface of the ascospores which in addition have two distinct equatorial crests and N. galapagensis by a funiculose colony morphology, short and narrow conidiophores and ascospores with two wide equatorial crests with a microtuberculate convex surface. Aspergillus turcosus can be distinguished by velvety, gray turquoise colonies and short, loosely columnar conidial heads. The four new taxa also have unique extrolite profiles, which contain the mycotoxins gliotoxin and viriditoxin in N. denticulate; apolar compounds provisionally named NEPS in N. assulata and gregatins in N. galapagensis. A. turcosus produced kotanins. N. denticulata sp. nov., N. assulata sp. nov., N. galapagensis sp. nov., and A. turcosus sp. nov. are described and illustrated.

  3. New taxa of Neosartorya and Aspergillus in Aspergillus section Fumigati

    PubMed Central

    Hong, Seung-Beom; Shin, Hyeon-Dong; Hong, Joonbae; Frisvad, Jens C.; Nielsen, Per V.; Varga, János

    2007-01-01

    Three new species of Neosartorya and one new Aspergillus of section Fumigati are proposed using a polyphasic approach based on morphology, extrolite production and partial β-tubulin, calmodulin, and actin gene sequences. The phylogenetic analyses using the three genes clearly show that the taxa grouped separately from the known species and confirmed the phenotypic differences. Neosartorya denticulata is characterized by its unique denticulate ascospores with a prominent equatorial furrow; N. assulata by well developed flaps on the convex surface of the ascospores which in addition have two distinct equatorial crests and N. galapagensis by a funiculose colony morphology, short and narrow conidiophores and ascospores with two wide equatorial crests with a microtuberculate convex surface. Aspergillus turcosus can be distinguished by velvety, gray turquoise colonies and short, loosely columnar conidial heads. The four new taxa also have unique extrolite profiles, which contain the mycotoxins gliotoxin and viriditoxin in N. denticulate; apolar compounds provisionally named NEPS in N. assulata and gregatins in N. galapagensis. A. turcosus produced kotanins. N.denticulata sp. nov., N. assulata sp. nov., N. galapagensis sp. nov., and A. turcosus sp. nov. are described and illustrated. Electronic supplementary material The online version of this article (doi:10.1007/s10482-007-9183-1) contains supplementary material, which is available to authorized users. PMID:17610141

  4. Two novel species of Aspergillus section Nigri from indoor air

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus collinsii, Aspergillus floridensis, and Aspergillus trinidadensis are described as novel uniseriate species of Aspergillus section Nigri isolated from air samples. To describe the species we used phenotypes from 7-d Czapek yeast extract agar culture (CYA) and malt extract agar culture (M...

  5. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  6. PRIMED: PRIMEr database for deleting and tagging all fission and budding yeast genes developed using the open-source genome retrieval script (GRS).

    PubMed

    Cummings, Michael T; Joh, Richard I; Motamedi, Mo

    2015-01-01

    The fission (Schizosaccharomyces pombe) and budding (Saccharomyces cerevisiae) yeasts have served as excellent models for many seminal discoveries in eukaryotic biology. In these organisms, genes are deleted or tagged easily by transforming cells with PCR-generated DNA inserts, flanked by short (50-100 bp) regions of gene homology. These PCR reactions use especially designed long primers, which, in addition to the priming sites, carry homology for gene targeting. Primer design follows a fixed method but is tedious and time-consuming especially when done for a large number of genes. To automate this process, we developed the Python-based Genome Retrieval Script (GRS), an easily customizable open-source script for genome analysis. Using GRS, we created PRIMED, the complete PRIMEr D atabase for deleting and C-terminal tagging genes in the main S. pombe and five of the most commonly used S. cerevisiae strains. Because of the importance of noncoding RNAs (ncRNAs) in many biological processes, we also included the deletion primer set for these features in each genome. PRIMED are accurate and comprehensive and are provided as downloadable Excel files, removing the need for future primer design, especially for large-scale functional analyses. Furthermore, the open-source GRS can be used broadly to retrieve genome information from custom or other annotated genomes, thus providing a suitable platform for building other genomic tools by the yeast or other research communities.

  7. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae.

    PubMed

    Galagan, James E; Calvo, Sarah E; Cuomo, Christina; Ma, Li-Jun; Wortman, Jennifer R; Batzoglou, Serafim; Lee, Su-In; Baştürkmen, Meray; Spevak, Christina C; Clutterbuck, John; Kapitonov, Vladimir; Jurka, Jerzy; Scazzocchio, Claudio; Farman, Mark; Butler, Jonathan; Purcell, Seth; Harris, Steve; Braus, Gerhard H; Draht, Oliver; Busch, Silke; D'Enfert, Christophe; Bouchier, Christiane; Goldman, Gustavo H; Bell-Pedersen, Deborah; Griffiths-Jones, Sam; Doonan, John H; Yu, Jaehyuk; Vienken, Kay; Pain, Arnab; Freitag, Michael; Selker, Eric U; Archer, David B; Peñalva, Miguel A; Oakley, Berl R; Momany, Michelle; Tanaka, Toshihiro; Kumagai, Toshitaka; Asai, Kiyoshi; Machida, Masayuki; Nierman, William C; Denning, David W; Caddick, Mark; Hynes, Michael; Paoletti, Mathieu; Fischer, Reinhard; Miller, Bruce; Dyer, Paul; Sachs, Matthew S; Osmani, Stephen A; Birren, Bruce W

    2005-12-22

    The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation. PMID:16372000

  8. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  9. Secondary metabolite profiles and antifungal drug susceptibility of Aspergillus fumigatus and closely related species, Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans.

    PubMed

    Tamiya, Hiroyuki; Ochiai, Eri; Kikuchi, Kazuyo; Yahiro, Maki; Toyotome, Takahito; Watanabe, Akira; Yaguchi, Takashi; Kamei, Katsuhiko

    2015-05-01

    The incidence of Aspergillus infection has been increasing in the past few years. Also, new Aspergillus fumigatus-related species, namely Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans, were shown to infect humans. These fungi exhibit marked morphological similarities to A. fumigatus, albeit with different clinical courses and antifungal drug susceptibilities. The present study used liquid chromatography/time-of-flight mass spectrometry to identify the secondary metabolites secreted as virulence factors by these Aspergillus species and compared their antifungal susceptibility. The metabolite profiles varied widely among A. fumigatus, A. lentulus, A. udagawae, and A. viridinutans, producing 27, 13, 8, and 11 substances, respectively. Among the mycotoxins, fumifungin, fumiquinazoline A/B and D, fumitremorgin B, gliotoxin, sphingofungins, pseurotins, and verruculogen were only found in A. fumigatus, whereas auranthine was only found in A. lentulus. The amount of gliotoxin, one of the most abundant mycotoxins in A. fumigatus, was negligible in these related species. In addition, they had decreased susceptibility to antifungal agents such as itraconazole and voriconazole, even though metabolites that were shared in the isolates showing higher minimum inhibitory concentrations than epidemiological cutoff values were not detected. These strikingly different secondary metabolite profiles may lead to the development of more discriminative identification protocols for such closely related Aspergillus species as well as improved treatment outcomes.

  10. Secondary metabolite profiles and antifungal drug susceptibility of Aspergillus fumigatus and closely related species, Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans.

    PubMed

    Tamiya, Hiroyuki; Ochiai, Eri; Kikuchi, Kazuyo; Yahiro, Maki; Toyotome, Takahito; Watanabe, Akira; Yaguchi, Takashi; Kamei, Katsuhiko

    2015-05-01

    The incidence of Aspergillus infection has been increasing in the past few years. Also, new Aspergillus fumigatus-related species, namely Aspergillus lentulus, Aspergillus udagawae, and Aspergillus viridinutans, were shown to infect humans. These fungi exhibit marked morphological similarities to A. fumigatus, albeit with different clinical courses and antifungal drug susceptibilities. The present study used liquid chromatography/time-of-flight mass spectrometry to identify the secondary metabolites secreted as virulence factors by these Aspergillus species and compared their antifungal susceptibility. The metabolite profiles varied widely among A. fumigatus, A. lentulus, A. udagawae, and A. viridinutans, producing 27, 13, 8, and 11 substances, respectively. Among the mycotoxins, fumifungin, fumiquinazoline A/B and D, fumitremorgin B, gliotoxin, sphingofungins, pseurotins, and verruculogen were only found in A. fumigatus, whereas auranthine was only found in A. lentulus. The amount of gliotoxin, one of the most abundant mycotoxins in A. fumigatus, was negligible in these related species. In addition, they had decreased susceptibility to antifungal agents such as itraconazole and voriconazole, even though metabolites that were shared in the isolates showing higher minimum inhibitory concentrations than epidemiological cutoff values were not detected. These strikingly different secondary metabolite profiles may lead to the development of more discriminative identification protocols for such closely related Aspergillus species as well as improved treatment outcomes. PMID:25737146

  11. Diversity of Aspergillus oryzae genotypes (RFLP) isolated from traditional soy sauce production within Malaysia and Southeast Asia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA fingerprinting was performed on 64 strains of Aspergillus oryzae and one strain of A. sojae isolated from soysauce factories within Malaysia and Southeast Asia that use primitive traditional methods in producing 'tamari type' Cantonese soy sauce. PstI digests of total genomic DNA from each isol...

  12. Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

    PubMed

    Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

    2016-09-01

    This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. PMID:27243603

  13. Genetic diversity of Aspergillus species isolated from onychomycosis and Aspergillus hongkongensis sp. nov., with implications to antifungal susceptibility testing.

    PubMed

    Tsang, Chi-Ching; Hui, Teresa W S; Lee, Kim-Chung; Chen, Jonathan H K; Ngan, Antonio H Y; Tam, Emily W T; Chan, Jasper F W; Wu, Andrea L; Cheung, Mei; Tse, Brian P H; Wu, Alan K L; Lai, Christopher K C; Tsang, Dominic N C; Que, Tak-Lun; Lam, Ching-Wan; Yuen, Kwok-Yung; Lau, Susanna K P; Woo, Patrick C Y

    2016-02-01

    Thirteen Aspergillus isolates recovered from nails of 13 patients (fingernails, n=2; toenails, n=11) with onychomycosis were characterized. Twelve strains were identified by multilocus sequencing as Aspergillus spp. (Aspergillus sydowii [n=4], Aspergillus welwitschiae [n=3], Aspergillus terreus [n=2], Aspergillus flavus [n=1], Aspergillus tubingensis [n=1], and Aspergillus unguis [n=1]). Isolates of A. terreus, A. flavus, and A. unguis were also identifiable by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The 13th isolate (HKU49(T)) possessed unique morphological characteristics different from other Aspergillus spp. Molecular characterization also unambiguously showed that HKU49(T) was distinct from other Aspergillus spp. We propose the novel species Aspergillus hongkongensis to describe this previously unknown fungus. Antifungal susceptibility testing showed most Aspergillus isolates had low MICs against itraconazole and voriconazole, but all Aspergillus isolates had high MICs against fluconazole. A diverse spectrum of Aspergillus species is associated with onychomycosis. Itraconazole and voriconazole are probably better drug options for Aspergillus onychomycosis.

  14. Characterization of the Far Transcription Factor Family in Aspergillus flavus

    PubMed Central

    Luo, Xingyu; Affeldt, Katharyn J.; Keller, Nancy P.

    2016-01-01

    Metabolism of fatty acids is a critical requirement for the pathogenesis of oil seed pathogens including the fungus Aspergillus flavus. Previous studies have correlated decreased ability to grow on fatty acids with reduced virulence of this fungus on host seed. Two fatty acid metabolism regulatory transcription factors, FarA and FarB, have been described in other filamentous fungi. Unexpectedly, we find A. flavus possesses three Far homologs, FarA, FarB, and FarC, with FarA and FarC showing a greater protein similarity to each other than FarB. farA and farB are located in regions of colinearity in all Aspergillus spp. sequenced to date, whereas farC is limited to a subset of species where it is inserted in an otherwise colinear region in Aspergillus genomes. Deletion and overexpression (OE) of farA and farB, but not farC, yielded mutants with aberrant growth patterns on specific fatty acids as well as altered expression of genes involved in fatty acid metabolism. Marked differences included significant growth defects of both ∆farA and ∆farB on medium-chain fatty acids and decreased growth of OE::farA on unsaturated fatty acids. Loss of farA diminished expression of mitochondrial β-oxidation genes whereas OE::farA inhibited expression of genes involved in unsaturated fatty acid catabolism. FarA also positively regulated the desaturase genes required to generate polyunsaturated fatty acids. Aflatoxin production on toxin-inducing media was significantly decreased in the ∆farB