Science.gov

Sample records for aspergillus genome database

  1. Screening of Aspergillus-derived FAD-glucose dehydrogenases from fungal genome database.

    PubMed

    Mori, Kazushige; Nakajima, Mitsuharu; Kojima, Katsuhiro; Murakami, Koudai; Ferri, Stefano; Sode, Koji

    2011-11-01

    Aspergillus-derived FAD-dependent glucose dehydrogenases (FADGDHs) were screened from fungal genomic databases, primarily by searching for putative homologues of the Aspergillus niger-derived glucose oxidase (GOD). Focusing on a GOD active-site motif, putative proteins annotated as belonging to the glucose methanol choline (GMC) oxidoreductase family were selected. Phylogenetic analysis of these putative proteins produced a GOD clade, which includes the A. niger and Penicillium amagasakiens GODs, and a second clade made up of putative proteins showing 30-40% homology with GOD. The genes encoding the proteins from the second clade were functionally expressed in Escherichia coli, resulting in dye-mediated glucose dehydrogenase (GDH) activity but not GOD activity. These results suggest that the putative proteins belonging to the second clade are FADGDHs. The 3D structure models of these FADGDHs were compared with the 3D structure of GOD. PMID:21748361

  2. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  3. Genomics of Aspergillus flavus mycotoxin production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The aspergilli show immense ecological and metabolic diversity. To date, the sequences of fifteen different Aspergillus genomes have been determined providing scientists with an exciting resource to improve the understanding of Aspergillus molecular genomics. Aspergillus flavus, one of the most wide...

  4. Draft Genome Sequences of Fungus Aspergillus calidoustus.

    PubMed

    Horn, Fabian; Linde, Jörg; Mattern, Derek J; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A; Petzke, Lutz; Valiante, Vito

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  5. Draft Genome Sequences of Fungus Aspergillus calidoustus

    PubMed Central

    Horn, Fabian; Linde, Jörg; Mattern, Derek J.; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A.; Petzke, Lutz

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  6. Aspergillus flavus Genomics for Controlling Aflatoxin Contamination

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The main objectives of the Aspergillus flavus genomics program are to identify genes and regulatory components involved in aflatoxin biosynthesis for solving aflatoxin contamination in agricultural crops. A. flavus Expressed Sequence Tags (EST), microarray and whole genome sequencing have been achi...

  7. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  8. Comparative Reannotation of 21 Aspergillus Genomes

    SciTech Connect

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  9. Querying genomic databases

    SciTech Connect

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  10. Genomic mining for Aspergillus natural products.

    PubMed

    Bok, Jin Woo; Hoffmeister, Dirk; Maggio-Hall, Lori A; Murillo, Renato; Glasner, Jeremy D; Keller, Nancy P

    2006-01-01

    The genus Aspergillus is renowned for its ability to produce a myriad of bioactive secondary metabolites. Although the propensity of biosynthetic genes to form contiguous clusters greatly facilitates assignment of putative secondary metabolite genes in the completed Aspergillus genomes, such analysis cannot predict gene expression and, ultimately, product formation. To circumvent this deficiency, we have examined Aspergillus nidulans microarrays for expressed secondary metabolite gene clusters by using the transcriptional regulator LaeA. Deletion or overexpression of laeA clearly identified numerous secondary metabolite clusters. A gene deletion in one of the clusters eliminated the production of the antitumor compound terrequinone A, a metabolite not described, from A. nidulans. In this paper, we highlight that LaeA-based genome mining helps decipher the secondary metabolome of Aspergilli and provides an unparalleled view to assess secondary metabolism gene regulation. PMID:16426969

  11. Genomics of peanut-Aspergillus flavus interactions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxin contamination caused by Aspergillus fungi is a great concern in peanut production worldwide. Pre-harvest Aspergillii infection and aflatoxin contamination are usually severe in peanuts that are grown under drought stressed conditions. Genomic research can provide new tools and resources to...

  12. Mouse genome database 2016.

    PubMed

    Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  13. Mouse genome database 2016

    PubMed Central

    Bult, Carol J.; Eppig, Janan T.; Blake, Judith A.; Kadin, James A.; Richardson, Joel E.

    2016-01-01

    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600

  14. Genomic Islands in Pathogenic Filamentous Fungus Aspergillus fumigatus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present the genome sequences of a new clinical isolate, CEA10, of an important human pathogen, Aspergillus fumigatus, and two closely related, but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of CEA10 with the recently sequen...

  15. What can Aspergillus flavus genome offer for mycotoxin research?

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomic study of filamentous fungi has made significant advances in recent years, and the genomes of several species in the genus Aspergillus have been sequenced, including Aspergillus flavus. This ubiquitous mold is present as a saprobe in a wide range of agricultural and natural habits, and c...

  16. A first glance into the genome sequence of Aspergillus flavus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins, produced by Aspergillus flavus and A. parasiticus, are toxic and carcinogenic metabolites. They contaminate agricultural crops before harvest and post harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed, Aspergillus flavus genomics p...

  17. WHOLE GENOME COMPARISON OF ASPERGILLUS FLAVUS AND A. ORYZAE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a plant and animal pathogen that also produces the potent carcinogen aflatoxin. Aspergillus oryzae is a closely related species that has been used for centuries in the food fermentation industry and is generally regarded as safe (GRAS). Whole genome sequences for these two fu...

  18. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  19. CROP GENOME DATABASES -- CRITICAL ISSUES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Crop genome databases, see www.agron.missouri.edu/bioservers.html of the past decade have had designed and implemented (1) models and schema for the genome and related domains; (2) methodologies for input of data by expert biologists and high-throughput projects; and (3) various text, graphical, and...

  20. Human genome protein function database.

    PubMed Central

    Sorenson, D. K.

    1991-01-01

    A database which focuses on the normal functions of the currently-known protein products of the Human Genome was constructed. Information is stored as text, figures, tables, and diagrams. The program contains built-in functions to modify, update, categorize, hypertext, search, create reports, and establish links to other databases. The semi-automated categorization feature of the database program was used to classify these proteins in terms of biomedical functions. PMID:1807638

  1. Aspergillus genomes: secret sex and the secrets of sex.

    PubMed

    Scazzocchio, Claudio

    2006-10-01

    The genomic sequences of three species of Aspergillus, including the model organism A. nidulans (which is homothallic: having no differentiated mating types, a strain being able to cross with itself), suggest that A. fumigatus and A. oryzae, considered to be asexual, might in fact be heterothallic (having two differentiated mating types, a strain being able to cross only with strains of opposite mating type). The genomic data have implications for the understanding of the evolution and the mechanism of sexual reproduction in this genus. We propose a model of epigenetic heterothallism to account for the reproductive patterns observed in Aspergillus nidulans. PMID:16911845

  2. Genomic analysis of aspergillus flavus pathogenesis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and Fusarium verticillioides colonize developing maize seeds and contaminate them with mycotoxins. Maize genotypes differ in resistance to these fungi, but incorporation of adequate resistance into desirable hybrids has been challenging.Little is known about pathogenesis of seeds...

  3. GOLD: The Genomes Online Database

    DOE Data Explorer

    Kyrpides, Nikos; Liolios, Dinos; Chen, Amy; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor; Bernal, Alex

    Since its inception in 1997, GOLD has continuously monitored genome sequencing projects worldwide and has provided the community with a unique centralized resource that integrates diverse information related to Archaea, Bacteria, Eukaryotic and more recently Metagenomic sequencing projects. As of September 2007, GOLD recorded 639 completed genome projects. These projects have their complete sequence deposited into the public archival sequence databases such as GenBank EMBL,and DDBJ. From the total of 639 complete and published genome projects as of 9/2007, 527 were bacterial, 47 were archaeal and 65 were eukaryotic. In addition to the complete projects, there were 2158 ongoing sequencing projects. 1328 of those were bacterial, 59 archaeal and 771 eukaryotic projects. Two types of metadata are provided by GOLD: (i) project metadata and (ii) organism/environment metadata. GOLD CARD pages for every project are available from the link of every GOLD_STAMP ID. The information in every one of these pages is organized into three tables: (a) Organism information, (b) Genome project information and (c) External links. [The Genomes On Line Database (GOLD) in 2007: Status of genomic and metagenomic projects and their associated metadata, Konstantinos Liolios, Konstantinos Mavromatis, Nektarios Tavernarakis and Nikos C. Kyrpides, Nucleic Acids Research Advance Access published online on November 2, 2007, Nucleic Acids Research, doi:10.1093/nar/gkm884]

    The basic tables in the GOLD database that can be browsed or searched include the following information:

    • Gold Stamp ID
    • Organism name
    • Domain
    • Links to information sources
    • Size and link to a map, when available
    • Chromosome number, Plas number, and GC content
    • A link for downloading the actual genome data
    • Institution that did the sequencing
    • Funding source
    • Database where information resides
    • Publication status and information

    • Maize Genetics and Genomics Database

      Technology Transfer Automated Retrieval System (TEKTRAN)

      The 2007 report for MaizeGDB lists the new hires who will focus on curation/outreach and the genome sequence, respectively. Currently all sequence in the database comes from a PlantGDB pipeline and is presented with deep links to external resources such as PlantGDB, Dana Farber, GenBank, the Arizona...

    • GOBASE: an organelle genome database

      PubMed Central

      O’Brien, Emmet A.; Zhang, Yue; Wang, Eric; Marie, Veronique; Badejoko, Wole; Lang, B. Franz; Burger, Gertraud

      2009-01-01

      The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (∼913 000) and chloroplast-encoded sequences (∼250 000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing data, with substitutions, insertions and deletions displayed using multiple alignments; (ii) addition of medically relevant information, such as haplotypes, SNPs and associated disease states, to human mitochondrial sequence data; (iii) addition of fully reannotated genome sequences for Escherichia coli and Nostoc sp., for reference and comparison; and (iv) a number of interface enhancements, such as the availability of both genomic and gene-coding sequence downloads, and a more sophisticated literature reference search functionality with links to PubMed where available. Future projects include the transfer of GOBASE features to NCBI/GenBank, allowing long-term preservation of accumulated expert information. The GOBASE database can be found at http://gobase.bcm.umontreal.ca/. Queries about custom and large-scale data retrievals should be addressed to gobase@bch.umontreal.ca. PMID:18953030

    • What can comparative genomics tell us about species concepts in the genus Aspergillus?

      SciTech Connect

      Rokas, Antonis; payne, gary; Federova, Natalie D.; Baker, Scott E.; Machida, Masa; yu, Jiujiang; georgianna, D. R.; Dean, Ralph A.; Bhatnagar, Deepak; Cleveland, T. E.; Wortman, Jennifer R.; Maiti, R.; Joardar, V.; Amedeo, Paolo; Denning, David W.; Nierman, William C.

      2007-12-15

      Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A. flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.

    • Sexual structures in Aspergillus: morphology, importance and genomics.

      PubMed

      Geiser, David M

      2009-01-01

      The genus Aspergillus comprises a few hundred species sharing a common asexual spore forming structure, the aspergillum. Approximately one-third of these species also produce a sexual stage, all but five of which are known to be homothallic. Sexual stages associated with Aspergillus fall into approximately ten different genera, reflecting a tremendous degree of phylogenetic and biological diversity. Sexual stages in Aspergillus are plectomycetous, typical for the order in which it resides, the Eurotiales. Theoretically, a homothallic Aspergillus species can produce both asexual conidia and sexual ascospores in both clonal and recombinant fashion, although the actual significance of these potential modes of reproduction is unclear. Aspergillus species with known sexual stages tend to be minor players in infections of humans, perhaps because of their tendency to produce fewer asexual spores compared to their non-teleomorphic congeners. The discovery of population genetic and genomic evidence for sex in species with no known sexual stage indicates that no assumptions can be made about the clonal versus recombinant life histories of a species based on its known mitotic and/or meiotic reproductive modes. PMID:18608901

    • The 2008 update of the Aspergillus nidulans genome annotation: a community effort

      PubMed Central

      Wortman, Jennifer Russo; Gilsenan, Jane Mabey; Joardar, Vinita; Deegan, Jennifer; Clutterbuck, John; Andersen, Mikael R.; Archer, David; Bencina, Mojca; Braus, Gerhard; Coutinho, Pedro; von Döhren, Hans; Doonan, John; Driessen, Arnold J.M.; Durek, Pawel; Espeso, Eduardo; Fekete, Erzsébet; Flipphi, Michel; Estrada, Carlos Garcia; Geysens, Steven; Goldman, Gustavo; de Groot, Piet W.J.; Hansen, Kim; Harris, Steven D.; Heinekamp, Thorsten; Helmstaedt, Kerstin; Henrissat, Bernard; Hofmann, Gerald; Homan, Tim; Horio, Tetsuya; Horiuchi, Hiroyuki; James, Steve; Jones, Meriel; Karaffa, Levente; Karányi, Zsolt; Kato, Masashi; Keller, Nancy; Kelly, Diane E.; Kiel, Jan A.K.W.; Kim, Jung-Mi; van der Klei, Ida J.; Klis, Frans M.; Kovalchuk, Andriy; Kraševec, Nada; Kubicek, Christian P.; Liu, Bo; MacCabe, Andrew; Meyer, Vera; Mirabito, Pete; Miskei, Márton; Mos, Magdalena; Mullins, Jonathan; Nelson, David R.; Nielsen, Jens; Oakley, Berl R.; Osmani, Stephen A.; Pakula, Tiina; Paszewski, Andrzej; Paulsen, Ian; Pilsyk, Sebastian; Pócsi, István; Punt, Peter J.; Ram, Arthur F.J.; Ren, Qinghu; Robellet, Xavier; Robson, Geoff; Seiboth, Bernhard; Solingen, Piet van; Specht, Thomas; Sun, Jibin; Taheri-Talesh, Naimeh; Takeshita, Norio; Ussery, Dave; vanKuyk, Patricia A.; Visser, Hans; van de Vondervoort, Peter J.I.; de Vries, Ronald P.; Walton, Jonathan; Xiang, Xin; Xiong, Yi; Zeng, An Ping; Brandt, Bernd W.; Cornell, Michael J.; van den Hondel, Cees A.M.J.J.; Visser, Jacob; Oliver, Stephen G.; Turner, Geoffrey

      2010-01-01

      The identification and annotation of protein-coding genes is one of the primary goals of whole-genome sequencing projects, and the accuracy of predicting the primary protein products of gene expression is vital to the interpretation of the available data and the design of downstream functional applications. Nevertheless, the comprehensive annotation of eukaryotic genomes remains a considerable challenge. Many genomes submitted to public databases, including those of major model organisms, contain significant numbers of wrong and incomplete gene predictions. We present a community-based reannotation of the Aspergillus nidulans genome with the primary goal of increasing the number and quality of protein functional assignments through the careful review of experts in the field of fungal biology. PMID:19146970

    • Comparative Genomics of Aspergillus flavus and A. oryzae: An Early View

      Technology Transfer Automated Retrieval System (TEKTRAN)

      Aspergillus flavus produces aflatoxins and is the second leading cause of aspergillosis in immunocompromised individuals. Aspergillus oryzae, on the other hand, has been used for centuries in Japan for the fermentation of food. The recently available whole genome sequences of Aspergillus flavus an...

    • Impact of Aspergillus oryzae genomics on industrial production of metabolites.

      PubMed

      Abe, Keietsu; Gomi, Katusya; Hasegawa, Fumihiko; Machida, Masayuki

      2006-09-01

      Aspergillus oryzae is used extensively for the production of the traditional Japanese fermented foods sake (rice wine), shoyu (soy sauce), and miso (soybean paste). In recent years, recombinant DNA technology has been used to enhance industrial enzyme production by A. oryzae. Recently completed genomic studies using expressed sequence tag (EST) analyses and whole-genome sequencing are quickly expanding the industrial potential of the fungus in biotechnology. Genes that have been newly discovered through genome research can be used for the production of novel valuable enzymes and chemicals, and are important for designing new industrial processes. This article describes recent progress of A . oryzae genomics and its impact on industrial production of enzymes, metabolites, and bioprocesses. PMID:16944282

    • Searching and Indexing Genomic Databases via Kernelization

      PubMed Central

      Gagie, Travis; Puglisi, Simon J.

      2015-01-01

      The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

    • Aspergillus Niger Genomics: Past, Present and into the Future

      SciTech Connect

      Baker, Scott E.

      2006-09-01

      Aspergillus niger is a filamentous ascomycete fungus that is ubiquitous in the environment and has been implicated in opportunistic infections of humans. In addition to its role as an opportunistic human pathogen, A. niger is economically important as a fermentation organism used for the production of citric acid. Industrial citric acid production by A. niger represents one of the most efficient, highest yield bioprocesses in use currently by industry. The genome size of A. niger is estimated to be between 35.5 and 38.5 megabases (Mb) divided among eight chromosomes/linkage groups that vary in size from 3.5 - 6.6 Mb. Currently, there are three independent A. niger genome projects, an indication of the economic importance of this organism. The rich amount of data resulting from these multiple A. niger genome sequences will be used for basic and applied research programs applicable to fermentation process development, morphology and pathogenicity.

    • The complete mitochondrial genome sequence of Aspergillus flavus.

      PubMed

      Yan, Zhengsong; Chen, Dan; Shen, Yiping; Ye, Baodong

      2016-07-01

      Aspergillus flavus is a haploid filamentous fungus that is common in the environment and has been implicated in human infections. The complete mitochondrial genome of A. flavus has been determined by high-throughput sequencing technology in this work. Our study revealed that the mitochondrial genome of A. flavus is 31,602 bp long, with an A + T content of 74.83%, which consists of a usual set of mitochondrial proteins and RNA genes, including large and small ribosomal RNAs, 15 proteins, and 20 tRNA genes and contains two introns. Notably, it also contains two hypothetical proteins without obvious homology to any known proteins. All structural genes are located on one strand and are apparently transcribed in one direction. Codon usage analysis indicated that all protein coding genes employ the standard fungal mitochondrial start and stop codons; and the nucleotide bias toward AT was also reflected in the codon usage. The complete mitochondrial genomes of A. flavus would be useful for future investigation of the genetic, evolution, and clinical identification of Aspergillus species. PMID:25922962

    • Genomic Islands in the Pathogenic Filamentous Fungus Aspergillus fumigatus

      PubMed Central

      Fedorova, Natalie D.; Khaldi, Nora; Joardar, Vinita S.; Maiti, Rama; Amedeo, Paolo; Anderson, Michael J.; Crabtree, Jonathan; Silva, Joana C.; Badger, Jonathan H.; Albarraq, Ahmed; Angiuoli, Sam; Bussey, Howard; Bowyer, Paul; Cotty, Peter J.; Dyer, Paul S.; Egan, Amy; Galens, Kevin; Fraser-Liggett, Claire M.; Haas, Brian J.; Inman, Jason M.; Kent, Richard; Lemieux, Sebastien; Malavazi, Iran; Orvis, Joshua; Roemer, Terry; Ronning, Catherine M.; Sundaram, Jaideep P.; Sutton, Granger; Turner, Geoff; Venter, J. Craig; White, Owen R.; Whitty, Brett R.; Youngman, Phil; Wolfe, Kenneth H.; Goldman, Gustavo H.; Wortman, Jennifer R.; Jiang, Bo; Denning, David W.; Nierman, William C.

      2008-01-01

      We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated “gene dumps” and, perhaps, simultaneously, as “gene factories”. PMID:18404212

    • Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.

      PubMed

      Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

      2016-01-01

      We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

    • Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine

      PubMed Central

      Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

      2016-01-01

      We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564

    • Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

      DOE PAGESBeta

      Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

      2015-03-13

      In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

    • Plant cytogenetics in genome databases

      Technology Transfer Automated Retrieval System (TEKTRAN)

      Cytogenetic maps provide an integrated representation of genetic and cytological information that can be used to enhance genome and chromosome research. As genome analysis technologies become more affordable, the density of markers on cytogenetic maps increases, making these resources more useful a...

    • Complete mitochondrial genome of an Amynthas earthworm, Amynthas aspergillus (Oligochaeta: Megascolecidae).

      PubMed

      Zhang, Liangliang; Jiang, Jibao; Dong, Yan; Qiu, Jiangping

      2016-05-01

      We have determined the mitochondrial genome of the first Amynthas earthworm, Amynthas aspergillus (Perrier, 1872), which is a natural medical resource in Chinese traditional medicine. Its mitogenome is 15,115 bp in length containing 37 genes with the same contents and order as other sequenced earthworms. All genes are encoded by the same strand, all 13 PCGs use ATG as start codon. The content of A + T is 63.04% for A. aspergillus (33.41% A, 29.63% T, 14.56% G and 22.41% C). The complete mitochondrial genomes of A. aspergillus would be useful for the reconstruction of Oligochaeta polygenetic relationships. PMID:25329289

    • Genome Statute and Legislation Database

      MedlinePlus

      ... of page Last Reviewed: February 29, 2016 Get Email Updates Advancing human health through genomics research Privacy Copyright Contact Accessibility Plug-ins Site Map Staff Directory FOIA Share Top

  1. BGD: A Database of Bat Genomes

    PubMed Central

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/. PMID:26110276

  2. BGD: a database of bat genomes.

    PubMed

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/. PMID:26110276

  3. The UCSC Genome Browser database: 2016 update.

    PubMed

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  4. The UCSC Genome Browser database: 2016 update

    PubMed Central

    Speir, Matthew L.; Zweig, Ann S.; Rosenbloom, Kate R.; Raney, Brian J.; Paten, Benedict; Nejad, Parisa; Lee, Brian T.; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S.; Heitner, Steve; Harte, Rachel A.; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A.; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the “Data Integrator”, for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment. PMID:26590259

  5. The Saccharomyces Genome Database Variant Viewer.

    PubMed

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  6. GWIDD: Genome-wide protein docking database

    PubMed Central

    Kundrotas, Petras J.; Zhu, Zhengwei; Vakser, Ilya A.

    2010-01-01

    Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein–protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu. PMID:19900970

  7. The UCSC Genome Browser database: 2014 update

    PubMed Central

    Karolchik, Donna; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S.; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hinrichs, Angie S.; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Raney, Brian J.; Rhead, Brooke; Rosenbloom, Kate R.; Sloan, Cricket A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2014-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser’s web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation ‘tracks’ for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany. PMID:24270787

  8. WGE: a CRISPR database for genome engineering

    PubMed Central

    Hodgkins, Alex; Farne, Anna; Perera, Sajith; Grego, Tiago; Parry-Smith, David J.; Skarnes, William C.; Iyer, Vivek

    2015-01-01

    Summary: The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome. Availability and implementation: The WGE database is freely available at www.sanger.ac.uk/htgt/wge Contact: vvi@sanger.ac.uk or skarnes@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25979474

  9. Genome Sequencing and Evolutionary Analysis of Marine Gut Fungus Aspergillus sp. Z5 from Ligia oceanica.

    PubMed

    Li, Xue; Xu, Jin-Zhong; Wang, Wen-Jie; Chen, Yi-Wang; Zheng, Dao-Qiong; Di, Ya-Nan; Li, Ping; Wang, Pin-Mei; Li, Yu-Dong

    2016-01-01

    Aspergillus sp. Z5, isolated from the gut of marine isopods, produces prolific secondary metabolites with new structure and bioactivity. Here, we report the draft sequence of the approximately 33.8-Mbp genome of this strain. To the best of our knowledge, this is the first genome sequence of Aspergillus strain isolated from marine isopod Ligia oceanica. The phylogenetic analysis supported that this strain was closely related to A. versicolor, and genomic analysis revealed that Aspergillus sp. Z5 shared a high degree of colinearity with the genome of A. sydowii. Our results may facilitate studies on discovering the biosynthetic pathways of secondary metabolites and elucidating their evolution in this species. PMID:27081303

  10. Genome Sequencing and Evolutionary Analysis of Marine Gut Fungus Aspergillus sp. Z5 from Ligia oceanica

    PubMed Central

    Li, Xue; Xu, Jin-Zhong; Wang, Wen-Jie; Chen, Yi-Wang; Zheng, Dao-Qiong; Di, Ya-Nan; Li, Ping; Wang, Pin-Mei; Li, Yu-Dong

    2016-01-01

    Aspergillus sp. Z5, isolated from the gut of marine isopods, produces prolific secondary metabolites with new structure and bioactivity. Here, we report the draft sequence of the approximately 33.8-Mbp genome of this strain. To the best of our knowledge, this is the first genome sequence of Aspergillus strain isolated from marine isopod Ligia oceanica. The phylogenetic analysis supported that this strain was closely related to A. versicolor, and genomic analysis revealed that Aspergillus sp. Z5 shared a high degree of colinearity with the genome of A. sydowii. Our results may facilitate studies on discovering the biosynthetic pathways of secondary metabolites and elucidating their evolution in this species. PMID:27081303

  11. What Can Comparative Genomics Tell Us About Species Concepts in the Genus Aspergillus?

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Understanding the nature of species’ boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four ca...

  12. CottonDB: Cotton Genome Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonDB (www.cottondb.org) is the first and most comprehensive source of cotton genome information. CottonDB is maintained at the Southern Plains Agricultural Research Center in College Station, TX. The project includes a website and database creating a repository of information for over 355,000 ...

  13. The UCSC Genome Browser database: 2015 update.

    PubMed

    Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  14. The UCSC Genome Browser database: 2015 update

    PubMed Central

    Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Miga, Karen H.; Nguyen, Ngan; Paten, Benedict; Raney, Brian J.; Smit, Arian F. A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James

    2015-01-01

    Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374

  15. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource. PMID:26223881

  16. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. PMID:25560631

  17. The Saccharomyces Genome Database: Exploring Genome Features and Their Annotations.

    PubMed

    Cherry, J Michael

    2015-12-01

    Genomic-scale assays result in data that provide information over the entire genome. Such base pair resolution data cannot be summarized easily except via a graphical viewer. A genome browser is a tool that displays genomic data and experimental results as horizontal tracks. Genome browsers allow searches for a chromosomal coordinate or a feature, such as a gene name, but they do not allow searches by function or upstream binding site. Entry into a genome browser requires that you identify the gene name or chromosomal coordinates for a region of interest. A track provides a representation for genomic results and is displayed as a row of data shown as line segments to indicate regions of the chromosome with a feature. Another type of track presents a graph or wiggle plot that indicates the processed signal intensity computed for a particular experiment or set of experiments. Wiggle plots are typical for genomic assays such as the various next-generation sequencing methods (e.g., chromatin immunoprecipitation [ChIP]-seq or RNA-seq), where it represents a peak of DNA binding, histone modification, or the mapping of an RNA sequence. Here we explore the browser that has been built into the Saccharomyces Genome Database (SGD). PMID:26631126

  18. Complete Genome Sequence of Soil Fungus Aspergillus terreus (KM017963), a Potent Lovastatin Producer.

    PubMed

    Savitha, Janakiraman; Bhargavi, S D; Praveen, V K

    2016-01-01

    We report the complete genome of Aspergillus terreus (KM017963), a tropical soil isolate. The genome sequence is 29 Mb, with a G+C content of 51.12%. The genome sequence of A. terreus shows the presence of the complete gene cluster responsible for lovastatin (an anti-cholesterol drug) production in a single scaffold (1.16). PMID:27284150

  19. Complete Genome Sequence of Soil Fungus Aspergillus terreus (KM017963), a Potent Lovastatin Producer

    PubMed Central

    Bhargavi, S. D.; Praveen, V. K.

    2016-01-01

    We report the complete genome of Aspergillus terreus (KM017963), a tropical soil isolate. The genome sequence is 29 Mb, with a G+C content of 51.12%. The genome sequence of A. terreus shows the presence of the complete gene cluster responsible for lovastatin (an anti-cholesterol drug) production in a single scaffold (1.16). PMID:27284150

  20. The mouse genome informatics and the mouse genome database

    SciTech Connect

    Maltais, L.J.; Blackburn, R.E.; Bradt, D.W.

    1994-09-01

    The Mouse Genome Database (MGD) is a centralized, comprehensive database of the mouse genome that includes genetic mapping data, comparative mapping data, gene descriptions, mutant phenotype descriptions, strains and allelic polymorphism data, inbred strain characteristics, physical mapping data, and molecular probes and clones data. Data in MGD are obtained from the published literature and by electronic transfer from laboratories working on large backcross panels of mice. MGD provides tools that enable the user to search the database, retrieve data, generate reports, analyze data, annotate records, and build genetic maps. The Encyclopedia of the Mouse Genome provides a graphic user interface to mouse genome data. It consists of software tools including: LinkMap, a graphic display of genetic linkage maps with the ability to magnify regions of high locus density: CytoMap, a graphic display of cytogenetic maps showing banded chromosomes with cytogenetic locations of genes and chromosomal aberrations; CATS, a catalog searching tool for text retrieval of mouse locus descriptions. These software tools provide access to the following data sets: Chromosome Committee Reports, MIT Genome Center data, GBASE reports, Mouse Locus Catalog (MLC), and Mouse Cytogenetic Mapping Data. The MGD is available to the scientific community through the World Wide Web (WWW) and Gopher. In addition GBASE can be accessed via the Internet.

  1. The population genomics of mycotoxin diversity in Aspergillus flavus and Aspergillus parasiticus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mycotoxins, and especially the aflatoxins, are an enormous problem in agriculture, with aflatoxin B1 being the most carcinogenic known natural compound. The worldwide costs associated with aflatoxin monitoring and crop losses are in the hundreds of millions of dollars. Aspergillus flavus and A. par...

  2. Orchidstra: an integrated orchid functional genomics database.

    PubMed

    Su, Chun-lin; Chao, Ya-Ting; Yen, Shao-Hua; Chen, Chun-Yi; Chen, Wan-Chieh; Chang, Yao-Chien Alex; Shih, Ming-Che

    2013-02-01

    A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species. PMID:23324169

  3. How to use the Candida Genome Database

    PubMed Central

    Skrzypek, Marek S.; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Summary Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  4. How to Use the Candida Genome Database.

    PubMed

    Skrzypek, Marek S; Binkley, Jonathan; Sherlock, Gavin

    2016-01-01

    Studying Candida biology requires access to genomic sequence data in conjunction with experimental information that provides functional context to genes and proteins. The Candida Genome Database (CGD) integrates functional information about Candida genes and their products with a set of analysis tools that facilitate searching for sets of genes and exploring their biological roles. This chapter describes how the various types of information available at CGD can be searched, retrieved, and analyzed. Starting with the guided tour of the CGD Home page and Locus Summary page, this unit shows how to navigate the various assemblies of the C. albicans genome, how to use Gene Ontology tools to make sense of large-scale data, and how to access the microarray data archived at CGD. PMID:26519061

  5. Requirements and standards for organelle genome databases

    SciTech Connect

    Boore, Jeffrey L.

    2006-01-09

    Mitochondria and plastids (collectively called organelles)descended from prokaryotes that adopted an intracellular, endosymbioticlifestyle within early eukaryotes. Comparisons of their remnant genomesaddress a wide variety of biological questions, especially when includingthe genomes of their prokaryotic relatives and the many genes transferredto the eukaryotic nucleus during the transitions from endosymbiont toorganelle. The pace of producing complete organellar genome sequences nowmakes it unfeasible to do broad comparisons using the primary literatureand, even if it were feasible, it is now becoming uncommon for journalsto accept detailed descriptions of genome-level features. Unfortunatelyno database is currently useful for this task, since they have littlestandardization and are riddled with error. Here I outline what iscurrently wrong and what must be done to make this data useful to thescientific community.

  6. Potential of Aspergillus flavus Genomics for Applications in Biotechnology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a common saprophyte and opportunistic pathogen that survives in the natural environment by extracting nutrition from plant debris, insect carcasses and a variety of other carbon sources. A. flavus produces numerous secondary metabolites and hydrolytic enzymes. The primary obj...

  7. Draft Genome Sequence of Aspergillus niger Strain An76

    PubMed Central

    Gong, Weili; Cheng, Zhi; Zhang, Huaiqiang; Liu, Lin; Gao, Peiji

    2016-01-01

    The filamentous fungus Aspergillus niger has become one of the most important fungi in industrial biotechnology, and it can efficiently secrete both polysaccharide-degrading enzymes and organic acids. We report here the 6,074,961,332-bp draft sequence of A. niger strain An76, and the findings provide important information related to its lignocellulose-degrading ability. PMID:26893421

  8. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station

    PubMed Central

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  9. Draft Genome Sequences of Two Aspergillus fumigatus Strains, Isolated from the International Space Station.

    PubMed

    Singh, Nitin Kumar; Blachowicz, Adriana; Checinska, Aleksandra; Wang, Clay; Venkateswaran, Kasthuri

    2016-01-01

    Draft genome sequences of Aspergillus fumigatus strains (ISSFT-021 and IF1SW-F4), opportunistic pathogens isolated from the International Space Station (ISS), were assembled to facilitate investigations of the nature of the virulence characteristics of the ISS strains to other clinical strains isolated on Earth. PMID:27417828

  10. Advances in Aspergillus secondary metabolite research in the post-genomic era

    PubMed Central

    Sanchez, James F.; Somoza, Amber D.; Keller, Nancy P.

    2015-01-01

    This review studies the impact of whole genome sequencing on Aspergillus secondary metabolite research. There has been a proliferation of many new, intriguing discoveries since sequencing data became widely available. What is more, the genomes disclosed the surprising finding that there are many more secondary metabolite biosynthetic pathways than laboratory research had suggested. Activating these pathways has been met with some success, but many more dormant genes remain to be awakened. PMID:22228366

  11. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated from Peanut Seeds in Georgia

    PubMed Central

    Wang, Xinye Monica; Palencia, Edwin R.

    2016-01-01

    Aspergillus flavus and A. parasiticus fungi produce carcinogenic mycotoxins in peanut seeds, causing considerable impact on both human health and the economy. Here, we report nine genome sequences of Aspergillus spp., isolated from Georgia peanut seeds in 2014. The information obtained will lead to further biodiversity studies that are essential for developing control strategies. PMID:27081142

  12. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...

  13. Genomic Context of Azole Resistance Mutations in Aspergillus fumigatus Determined Using Whole-Genome Sequencing

    PubMed Central

    Abdolrasouli, Alireza; Rhodes, Johanna; Beale, Mathew A.; Hagen, Ferry; Rogers, Thomas R.; Chowdhary, Anuradha; Meis, Jacques F.

    2015-01-01

    ABSTRACT A rapid and global emergence of azole resistance has been observed in the pathogenic fungus Aspergillus fumigatus over the past decade. The dominant resistance mechanism appears to be of environmental origin and involves mutations in the cyp51A gene, which encodes a protein targeted by triazole antifungal drugs. Whole-genome sequencing (WGS) was performed for high-resolution single-nucleotide polymorphism (SNP) analysis of 24 A. fumigatus isolates, including azole-resistant and susceptible clinical and environmental strains obtained from India, the Netherlands, and the United Kingdom, in order to assess the utility of WGS for characterizing the alleles causing resistance. WGS analysis confirmed that TR34/L98H (a mutation comprising a tandem repeat [TR] of 34 bases in the promoter of the cyp51A gene and a leucine-to-histidine change at codon 98) is the sole mechanism of azole resistance among the isolates tested in this panel of isolates. We used population genomic analysis and showed that A. fumigatus was panmictic, with as much genetic diversity found within a country as is found between continents. A striking exception to this was shown in India, where isolates are highly related despite being isolated from both clinical and environmental sources across >1,000 km; this broad occurrence suggests a recent selective sweep of a highly fit genotype that is associated with the TR34/L98H allele. We found that these sequenced isolates are all recombining, showing that azole-resistant alleles are segregating into diverse genetic backgrounds. Our analysis delineates the fundamental population genetic parameters that are needed to enable the use of genome-wide association studies to identify the contribution of SNP diversity to the generation and spread of azole resistance in this medically important fungus. PMID:26037120

  14. Biocuration at the Saccharomyces genome database.

    PubMed

    Skrzypek, Marek S; Nash, Robert S

    2015-08-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  15. Biocuration at the Saccharomyces Genome Database

    PubMed Central

    Skrzypek, Marek S.; Nash, Robert S.

    2015-01-01

    Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell. PMID:25997651

  16. Genome-wide analysis of the Zn(II)₂Cys₆ zinc cluster-encoding gene family in Aspergillus flavus.

    PubMed

    Chang, Perng-Kuang; Ehrlich, Kenneth C

    2013-05-01

    Proteins with a Zn(II)₂Cys₆ domain, Cys-X₂-Cys-X₆-Cys-X₅₋₁₂-Cys-X₂-Cys-X₆₋₉-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overview of this family of genes. Annotation of this gene family in most fungal genomes is still far from perfect and refined bioinformatic algorithms are urgently needed. Aspergillus flavus is a saprophytic soil fungus that can produce the carcinogenic aflatoxin. It is the second leading causative agent of invasive aspergillosis. The 37-Mb genome of A. flavus is predicted to encode 12,000 proteins. Two and a half percent of the total proteins are estimated to contain the C6 domain, more than twofold greater than those estimated for yeast, which is about 1 %. The variability in the spacing between cysteines, C₃-C₄ and C₅-C₆, in the zinc cluster enables classification of the domains into distinct subgroups, which are also well conserved in Aspergillus nidulans. Sixty-six percent (202/306) of the A. flavus C6 proteins contain a specific transcription factor domain, and 7 % contain a domain of unknown function, DUF3468. Two A. nidulans C6 proteins containing the DUF3468 are involved in asexual conidiation and another two in sexual differentiation. In the anamorphic A. flavus, a homolog of the latter lacks the C6 domain. A. flavus being heterothallic and reproducing mainly through conidiation appears to have lost some components involved in homothallic sexual development. Of the 55 predicted gene clusters thought to be involved in production of secondary metabolites, only about half have a C6-encoding gene in or near the gene clusters. The features revealed by the A. flavus C6 proteins likely are common for other ascomycete fungi. PMID:23563886

  17. HGT-Finder: A New Tool for Horizontal Gene Transfer Finding and Application to Aspergillus genomes

    PubMed Central

    Nguyen, Marcus; Ekstrom, Alex; Li, Xueqiong; Yin, Yanbin

    2015-01-01

    Horizontal gene transfer (HGT) is a fast-track mechanism that allows genetically unrelated organisms to exchange genes for rapid environmental adaptation. We developed a new phyletic distribution-based software, HGT-Finder, which implements a novel bioinformatics algorithm to calculate a horizontal transfer index and a probability value for each query gene. Applying this new tool to the Aspergillus fumigatus, Aspergillus flavus, and Aspergillus nidulans genomes, we found 273, 542, and 715 transferred genes (HTGs), respectively. HTGs have shorter length, higher guanine-cytosine (GC) content, and relaxed selection pressure. Metabolic process and secondary metabolism functions are significantly enriched in HTGs. Gene clustering analysis showed that 61%, 41% and 74% of HTGs in the three genomes form physically linked gene clusters (HTGCs). Overlapping manually curated, secondary metabolite gene clusters (SMGCs) with HTGCs found that 9 of the 33 A. fumigatus SMGCs and 31 of the 65 A. nidulans SMGCs share genes with HTGCs, and that HTGs are significantly enriched in SMGCs. Our genome-wide analysis thus presented very strong evidence to support the hypothesis that HGT has played a very critical role in the evolution of SMGCs. The program is freely available at http://cys.bios.niu.edu/HGTFinder/HGTFinder.tar.gz. PMID:26473921

  18. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Gruber, C.; Geier, B.; Haase, D.; Kaps, A.; Lemcke, K.; Mannhaupt, G.; Pfeiffer, F.; Schüller, C.; Stocker, S.; Weil, B.

    2000-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de ). PMID:10592176

  19. Signaling pathways for stress responses and adaptation in Aspergillus species: stress biology in the post-genomic era.

    PubMed

    Hagiwara, Daisuke; Sakamoto, Kazutoshi; Abe, Keietsu; Gomi, Katsuya

    2016-09-01

    Aspergillus species are among the most important filamentous fungi in terms of industrial use and because of their pathogenic or toxin-producing features. The genomes of several Aspergillus species have become publicly available in this decade, and genomic analyses have contributed to an integrated understanding of fungal biology. Stress responses and adaptation mechanisms have been intensively investigated using the accessible genome infrastructure. Mitogen-activated protein kinase (MAPK) cascades have been highlighted as being fundamentally important in fungal adaptation to a wide range of stress conditions. Reverse genetics analyses have uncovered the roles of MAPK pathways in osmotic stress, cell wall stress, development, secondary metabolite production, and conidia stress resistance. This review summarizes the current knowledge on the stress biology of Aspergillus species, illuminating what we have learned from the genomic data in this "post-genomic era." PMID:27007956

  20. Recent advances in genome mining of secondary metabolites in Aspergillus terreus

    PubMed Central

    Guo, Chun-Jun; Wang, Clay C. C.

    2014-01-01

    Filamentous fungi are rich resources of secondary metabolites (SMs) with a variety of interesting biological activities. Recent advances in genome sequencing and techniques in genetic manipulation have enabled researchers to study the biosynthetic genes of these SMs. Aspergillus terreus is the well-known producer of lovastatin, a cholesterol-lowering drug. This fungus also produces other SMs, including acetylaranotin, butyrolactones, and territram, with interesting bioactivities. This review will cover recent progress in genome mining of SMs identified in this fungus. The identification and characterization of the gene cluster for these SMs, as well as the proposed biosynthetic pathways, will be discussed in depth. PMID:25566227

  1. CADRE: the Central Aspergillus Data REpository 2012.

    PubMed

    Mabey Gilsenan, Jane; Cooley, John; Bowyer, Paul

    2012-01-01

    The Central Aspergillus Data REpository (CADRE; http://www.cadre-genomes.org.uk) is a public resource for genomic data extracted from species of Aspergillus. It provides an array of online tools for searching and visualising features of this significant fungal genus. CADRE arose from a need within the medical community to understand the human pathogen Aspergillus fumigatus. Due to the paucity of Aspergillus genomic resources 10 years ago, the long-term goal of this project was to collate and maintain Aspergillus genomes as they became available. Since our first release in 2004, the resource has expanded to encompass annotated sequence for eight other Aspergilli and provides much needed support to the international Aspergillus research community. Recent developments, however, in sequencing technology are creating a vast amount of genomic data and, as a result, we shortly expect a tidal wave of Aspergillus data. In preparation for this, we have upgraded the database and software suite. This not only enables better management of more complex data sets, but also improves annotation by providing access to genome comparison data and the integration of high-throughput data. PMID:22080563

  2. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  3. Draft Genome Sequences of Two Closely Related Aflatoxigenic Aspergillus Species Obtained from the Ivory Coast.

    PubMed

    Moore, Geromy G; Mack, Brian M; Beltz, Shannon B

    2016-03-01

    Aspergillus ochraceoroseus and Aspergillus rambellii were isolated from soil detritus in Taï National Park, Ivory Coast, Africa. The Type strain for each species happens to be the only representative ever sampled. Both species secrete copious amounts of aflatoxin B1 and sterigmatocystin, because each of their genomes contains clustered genes for biosynthesis of these mycotoxins. We sequenced their genomes using a personal genome machine and found them to be smaller in size (A. ochraceoroseus = 23.9 Mb and A. rambellii = 26.1 Mb), as well as in numbers of predicted genes (7,837 and 7,807, respectively), compared to other sequenced Aspergilli. Our findings also showed that the A. ochraceoroseus Type strain contains a single MAT1-1 gene, while the Type strain of A. rambellii contains a single MAT1-2 gene, indicating that these species are heterothallic (self-infertile). These draft genomes will be useful for understanding the genes and pathways necessary for the cosynthesis of these two toxic secondary metabolites as well as the evolution of these pathways in aflatoxigenic fungi. PMID:26637470

  4. An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks.

    PubMed

    Stelzer, Michael; Sun, Jibin; Kamphans, Tom; Fekete, Sándor P; Zeng, An-Ping

    2011-11-01

    The bioreaction database established by Ma and Zeng (Bioinformatics, 2003, 19, 270-277) for in silico reconstruction of genome-scale metabolic networks has been widely used. Based on more recent information in the reference databases KEGG LIGAND and Brenda, we upgrade the bioreaction database in this work by almost doubling the number of reactions from 3565 to 6851. Over 70% of the reactions have been manually updated/revised in terms of reversibility, reactant pairs, currency metabolites and error correction. For the first time, 41 spontaneous sugar mutarotation reactions are introduced into the biochemical database. The upgrade significantly improves the reconstruction of genome scale metabolic networks. Many gaps or missing biochemical links can be recovered, as exemplified with three model organisms Homo sapiens, Aspergillus niger, and Escherichia coli. The topological parameters of the constructed networks were also largely affected, however, the overall network structure remains scale-free. Furthermore, we consider the problem of computing biologically feasible shortest paths in reconstructed metabolic networks. We show that these paths are hard to compute and present solutions to find such paths in networks of small and medium size. PMID:21952610

  5. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    PubMed

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055

  6. Exploration of the Chemical Space of Public Genomic Databases

    EPA Science Inventory

    The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.

  7. Design and implementation of the cacao genome database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Cacao Genome Database (CGD, www.cacaogenomedb.org) is being developed to provide a comprehensive data mining resource of genomic, genetic and breeding data for Theobroma cacao. Designed using Chado and a collection of Drupal modules, known as Tripal, CGD currently contains the genetically anchor...

  8. Uniform standards for genome databases in forest and fruit trees

    Technology Transfer Automated Retrieval System (TEKTRAN)

    TreeGenes and tfGDR serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype...

  9. : a database of ciliate genome rearrangements

    PubMed Central

    Burns, Jonathan; Kukushkin, Denys; Lindblad, Kelsi; Chen, Xiao; Jonoska, Nataša; Landweber, Laura F.

    2016-01-01

    Ciliated protists exhibit nuclear dimorphism through the presence of somatic macronuclei (MAC) and germline micronuclei (MIC). In some ciliates, DNA from precursor segments in the MIC genome rearranges to form transcriptionally active genes in the mature MAC genome, making these ciliates model organisms to study the process of somatic genome rearrangement. Similar broad scale, somatic rearrangement events occur in many eukaryotic cells and tumors. The (http://oxytricha.princeton.edu/mds_ies_db) is a database of genome recombination and rearrangement annotations, and it provides tools for visualization and comparative analysis of precursor and product genomes. The database currently contains annotations for two completely sequenced ciliate genomes: Oxytricha trifallax and Tetrahymena thermophila. PMID:26586804

  10. The launch of the Aspergillus flavus genome browser and limited release of whole genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are produced by Aspergillus flavus and A. parasiticus. These toxic and carcinogenic compounds contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed, Aspergilllus flavus wh...

  11. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  12. Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics.

    PubMed

    Ito, Yuichi; Arikawa, Kohji; Antonio, Baltazar A; Ohta, Isamu; Naito, Shinji; Mukai, Yoshiyuki; Shimano, Atsuko; Masukawa, Masatoshi; Shibata, Michie; Yamamoto, Mayu; Ito, Yukiyo; Yokoyama, Junri; Sakai, Yasumichi; Sakata, Katsumi; Nagamura, Yoshiaki; Namiki, Nobukazu; Matsumoto, Takashi; Higo, Kenichi; Sasaki, Takuji

    2005-01-01

    A contig-oriented database for annotation of the rice genome has been constructed to facilitate map-based rice genomics. The Rice Annotation Database has the following functional features: (i) extensive effort of manual annotations of P1-derived artificial chromosome/bacterial artificial chromosome clones can be merged at chromosome and contig-level; (ii) concise visualization of the annotation information such as the predicted genes, results of various prediction programs (RiceHMM, Genscan, Genscan+, Fgenesh, GeneMark, etc.), homology to expressed sequence tag, full-length cDNA and protein; (iii) user-friendly clone / gene query system; (iv) download functions for nucleotide, amino acid and coding sequences; (v) analysis of various features of the genome (GC-content, average value, etc.); and (vi) genome-wide homology search (BLAST) of contig- and chromosome-level genome sequence to allow comparative analysis with the genome sequence of other organisms. As of October 2004, the database contains a total of 215 Mb sequence with relevant annotation results including 30 000 manually curated genes. The database can provide the latest information on manual annotation as well as a comprehensive structural analysis of various features of the rice genome. The database can be accessed at http://rad.dna.affrc.go.jp/. PMID:15608281

  13. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants.

    PubMed

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-09-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  14. Kazusa Marker DataBase: a database for genomics, genetics, and molecular breeding in plants

    PubMed Central

    Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki

    2014-01-01

    In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561

  15. The UCSC Genome Browser database: extensions and updates 2013

    PubMed Central

    Meyer, Laurence R.; Zweig, Ann S.; Hinrichs, Angie S.; Karolchik, Donna; Kuhn, Robert M.; Wong, Matthew; Sloan, Cricket A.; Rosenbloom, Kate R.; Roe, Greg; Rhead, Brooke; Raney, Brian J.; Pohl, Andy; Malladi, Venkat S.; Li, Chin H.; Lee, Brian T.; Learned, Katrina; Kirkup, Vanessa; Hsu, Fan; Heitner, Steve; Harte, Rachel A.; Haeussler, Maximilian; Guruvadoo, Luvina; Goldman, Mary; Giardine, Belinda M.; Fujita, Pauline A.; Dreszer, Timothy R.; Diekhans, Mark; Cline, Melissa S.; Clawson, Hiram; Barber, Galt P.; Haussler, David; Kent, W. James

    2013-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation ‘tracks’ are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal. PMID:23155063

  16. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database.

    PubMed

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T; Karra, Kalpana; Hitz, Benjamin C; Nash, Robert S; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org. PMID:27252399

  17. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

    PubMed Central

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T.; Karra, Kalpana; Hitz, Benjamin C.; Nash, Robert S.; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J.

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org PMID:27252399

  18. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  19. Aspergillus flavus Genomics for Development of Strategies to Interrupt Aflatoxin Formation and Discovery of Fungal Enzymes for Biofuel Production

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus produces toxic and the most carcinogenic mycotoxins, the aflatoxins. The primary objectives of our A. flavus genomics program are to reduce and eliminate aflatoxin contamination in food and feed and control fungal infection in preharvest crops such as corn, cotton, peanut and tre...

  20. Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated from Peanut Seeds in Georgia.

    PubMed

    Faustinelli, Paola C; Wang, Xinye Monica; Palencia, Edwin R; Arias, Renée S

    2016-01-01

    Aspergillus flavusandA. parasiticusfungi produce carcinogenic mycotoxins in peanut seeds, causing considerable impact on both human health and the economy. Here, we report nine genome sequences ofAspergillusspp., isolated from Georgia peanut seeds in 2014. The information obtained will lead to further biodiversity studies that are essential for developing control strategies. PMID:27081142

  1. Prokaryotic Genomes from Microbes Online Database

    DOE Data Explorer

    Alm, Eric J.; Huang, Katherine H.; Price, Morgan N.; Koche, Richard P.; Keller, Keith; Dubchak, Inna L.; Arkin, Adam P.

    To describe the potential functions of genes, MicrobesOnline includes protein family analyses (from InterPro and COG), metabolic maps (from KEGG), links to research papers (from UniProt and PubMed), and operon predictions for every genome. To examine each gene's evolutionary history, MicrobesOnline includes precomputed phylogenetic trees for all the gene families. It displays gene trees with genomic context or it compares the gene tree to the species tree. The tools provided with MicrobesOnline allow users to: compute customized motifs, sequence alignments, and phylogenetic trees change expression patterns in metabolic maps annotate genes in various ways. A browse tree tool and a genome browser are available, along with specialized search capabilities. (Specialized Interface)

  2. OryzaGenome: Genome Diversity Database of Wild Oryza Species

    PubMed Central

    Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi-Xuan; Han, Bin; Kurata, Nori

    2016-01-01

    The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype–phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a text-based browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tab-delimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/. PMID:26578696

  3. GenColors-based comparative genome databases for small eukaryotic genomes

    PubMed Central

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources. PMID:23193285

  4. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    SciTech Connect

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; Bragulat, M. Rosa; Cigliano, Riccardo Aiese; Sánchez, Armand

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involved in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.

  5. Databases of homologous gene families for comparative genomics

    PubMed Central

    Penel, Simon; Arigon, Anne-Muriel; Dufayard, Jean-François; Sertier, Anne-Sophie; Daubin, Vincent; Duret, Laurent; Gouy, Manolo; Perrière, Guy

    2009-01-01

    Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at . PMID:19534752

  6. Megx.net: integrated database resource for marine ecological genomics.

    PubMed

    Kottmann, Renzo; Kostadinov, Ivalyo; Duhaime, Melissa Beth; Buttigieg, Pier Luigi; Yilmaz, Pelin; Hankeln, Wolfgang; Waldmann, Jost; Glöckner, Frank Oliver

    2010-01-01

    Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net. PMID:19858098

  7. Megx.net: integrated database resource for marine ecological genomics

    PubMed Central

    Kottmann, Renzo; Kostadinov, Ivalyo; Duhaime, Melissa Beth; Buttigieg, Pier Luigi; Yilmaz, Pelin; Hankeln, Wolfgang; Waldmann, Jost; Glöckner, Frank Oliver

    2010-01-01

    Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net. PMID:19858098

  8. Choosing a Genome Browser for a Model Organism Database (MOD): Surveying the Maize Community

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As the maize genome sequencing is nearing its completion, the Maize Genetics and Genomics Database (MaizeGDB), the Model Organism Database for maize, integrated a genome browser to its already existing Web interface and database. The addition of the MaizeGDB Genome Browser to MaizeGDB will allow it ...

  9. BBGD: an online database for blueberry genomic data

    PubMed Central

    Alkharouf, Nadim W; Dhanaraj, Anik L; Naik, Dhananjay; Overall, Chris; Matthews, Benjamin F; Rowland, Lisa J

    2007-01-01

    Background Blueberry is a member of the Ericaceae family, which also includes closely related cranberry and more distantly related rhododendron, azalea, and mountain laurel. Blueberry is a major berry crop in the United States, and one that has great nutritional and economical value. Extreme low temperatures, however, reduce crop yield and cause major losses to US farmers. A better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation is needed to produce blueberry cultivars with enhanced cold hardiness. To that end, the blueberry genomics database (BBDG) was developed. Along with the analysis tools and web-based query interfaces, the database serves both the broader Ericaceae research community and the blueberry research community specifically by making available ESTs and gene expression data in searchable formats and in elucidating the underlying mechanisms of cold acclimation and freeze tolerance in blueberry. Description BBGD is the world's first database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. Conclusion By using the database, researchers have developed EST-based markers for mapping and have identified a number of "candidate" cold tolerance genes that are highly expressed in blueberry flower buds after exposure to low temperatures. PMID:17263892

  10. CarrotDB: a genomic and transcriptomic database for carrot.

    PubMed

    Xu, Zhi-Sheng; Tan, Hua-Wei; Wang, Feng; Hou, Xi-Lin; Xiong, Ai-Sheng

    2014-01-01

    Carrot (Daucus carota L.) is an economically important vegetable worldwide and is the largest source of carotenoids and provitamin A in the human diet. Given the importance of this vegetable to humans, research and breeding communities on carrot should obtain useful genomic and transcriptomic information. The first whole-genome sequences of 'DC-27' carrot were de novo assembled and analyzed. Transcriptomic sequences of 14 carrot genotypes were downloaded from the Sequence Read Archive (SRA) database of National Center for Biotechnology Information (NCBI) and mapped to the whole-genome sequence before assembly. Based on these data sets, the first Web-based genomic and transcriptomic database for D. carota (CarrotDB) was developed (database homepage: http://apiaceae.njau.edu.cn/car rotdb). CarrotDB offers the tools of Genome Map and Basic Local Alignment Search Tool. Using these tools, users can search certain target genes and simple sequence repeats along with designed primers of 'DC-27'. Assembled transcriptomic sequences along with fragments per kilobase of transcript sequence per millions base pairs sequenced information (FPKM) information of 14 carrot genotypes are also provided. Users can download de novo assembled whole-genome sequences, putative gene sequences and putative protein sequences of 'DC-27'. Users can also download transcriptome sequence assemblies of 14 carrot genotypes along with their FPKM information. A total of 2826 transcription factor (TF) genes classified into 57 families were identified in the entire genome sequences. These TF genes were embedded in CarrotDB as an interface. The 'GERMPLASM' part of CarrotDB also offers taproot photos of 45 carrot genotypes and a table containing accession numbers, names, countries of origin and colors of cortex, phloem and xylem parts of taproots corresponding to each carrot genotype. CarrotDB will be continuously updated with new information. Database URL: http://apiaceae.njau.edu.cn/carrotdb/ PMID

  11. Querying genomic databases: refining the connectivity map.

    PubMed

    Segal, Mark R; Xiong, Hao; Bengtsson, Henrik; Bourgon, Richard; Gentleman, Robert

    2012-01-01

    The advent of high-throughput biotechnologies, which can efficiently measure gene expression on a global basis, has led to the creation and population of correspondingly rich databases and compendia. Such repositories have the potential to add enormous scientific value beyond that provided by individual studies which, due largely to cost considerations, are typified by small sample sizes. Accordingly, substantial effort has been invested in devising analysis schemes for utilizing gene-expression repositories. Here, we focus on one such scheme, the Connectivity Map (cmap), that was developed with the express purpose of identifying drugs with putative efficacy against a given disease, where the disease in question is characterized by a (differential) gene-expression signature. Initial claims surrounding cmap intimated that such tools might lead to new, previously unanticipated applications of existing drugs. However, further application suggests that its primary utility is in connecting a disease condition whose biology is largely unknown to a drug whose mechanisms of action are well understood, making cmap a tool for enhancing biological knowledge.The success of the Connectivity Map is belied by its simplicity. The aforementioned signature serves as an unordered query which is applied to a customized database of (differential) gene-expression experiments designed to elicit response to a wide range of drugs, across of spectrum of concentrations, durations, and cell lines. Such application is effected by computing a per experiment score that measures "closeness" between the signature and the experiment. Top-scoring experiments, and the attendant drug(s), are then deemed relevant to the disease underlying the query. Inference supporting such elicitations is pursued via re-sampling. In this paper, we revisit two key aspects of the Connectivity Map implementation. Firstly, we develop new approaches to measuring closeness for the common scenario wherein the query

  12. MaizeGDB: The Maize Genetics and Genomics Database.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genomic, genetic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project’s website...

  13. MaizeGDB: The Maize Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MaizeGDB is the community database for biological information about the crop plant Zea mays. Genetic, genomic, sequence, gene product, functional characterization, literature reference, and person/organization contact information are among the datatypes stored at MaizeGDB. At the project's website...

  14. A primer on rapid prototyping of genomic databases in Prolog

    SciTech Connect

    Yoshida, Kaoru; Smith, C.L.; Overbeek, R.

    1992-01-01

    This report presents a tutorial on how one might create an integrated database of genomic information. We outline the required steps for implementation, give a brief introduction to Prolog, and discuss the query facility supported by our system. Our goal is to enable researchers to being constructing their own biological information system.

  15. BRAD, the genetics and genomics database for Brassica plants

    PubMed Central

    2011-01-01

    Background Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data. Description BRAD, the Brassica database, is a web-based resource focusing on genome scale genetic and genomic data for important Brassica crops. BRAD was built based on the first whole genome sequence and on further data analysis of the Brassica A genome species, Brassica rapa (Chiifu-401-42). It provides datasets, such as the complete genome sequence of B. rapa, which was de novo assembled from Illumina GA II short reads and from BAC clone sequences, predicted genes and associated annotations, non coding RNAs, transposable elements (TE), B. rapa genes' orthologous to those in A. thaliana, as well as genetic markers and linkage maps. BRAD offers useful searching and data mining tools, including search across annotation datasets, search for syntenic or non-syntenic orthologs, and to search the flanking regions of a certain target, as well as the tools of BLAST and Gbrowse. BRAD allows users to enter almost any kind of information, such as a B. rapa or A. thaliana gene ID, physical position or genetic marker. Conclusion BRAD, a new database which focuses on the genetics and genomics of the Brassica plants has been developed, it aims at helping scientists and breeders to fully and efficiently use the information of genome data of Brassica plants. BRAD will be continuously updated and can be accessed through http://brassicadb.org. PMID:21995777

  16. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  17. Integrated database for identifying candate genes for Aspergillus flavus resistance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent af...

  18. Genomic Databases and Biobanks in Denmark.

    PubMed

    Hartlev, Mette

    2015-01-01

    Biobanking in Denmark is regulated via patients' rights laws, data protection laws, and research ethics reviews. Danish law recognizes tissue samples as personal data for purposes of the data protection laws, meaning research with tissue samples may be subject to research ethics review, data protection laws, and patients' rights requirements depending on the circumstances of collection. However, research on information gained through whole genome sequencing is subject only to data protection laws, despite the similarity in the nature of the information. The regulatory framework treats biobank samples collected from patients differently than samples collected from research participants, particularly with respect to autonomy. Importantly, biobanks established for future unspecified research are not subject to research ethics review. Biobank-based research has gained more prominence on the national level recently, and the potential for a less fragmented and more consistent regulatory approach may emerge from this attention. PMID:26711414

  19. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  20. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database

    PubMed Central

    Winsor, Geoffrey L.; Griffiths, Emma J.; Lo, Raymond; Dhillon, Bhavjinder K.; Shay, Julie A.; Brinkman, Fiona S. L.

    2016-01-01

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. PMID:26578582

  1. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases

    PubMed Central

    Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material. PMID:27489953

  2. Building a genome database using an object-oriented approach.

    PubMed

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language. PMID:12542407

  3. Mouse Genome Database: from sequence to phenotypes and disease models

    PubMed Central

    Eppig, Janan T.; Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.

    2015-01-01

    The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here we describe the data acquisition process, specifics about MGD’s key data areas, methods to access and query MGD data, and outreach and user help facilities. PMID:26150326

  4. Mouse Genome Database: From sequence to phenotypes and disease models.

    PubMed

    Eppig, Janan T; Richardson, Joel E; Kadin, James A; Smith, Cynthia L; Blake, Judith A; Bult, Carol J

    2015-08-01

    The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. PMID:26150326

  5. EU Laws on Privacy in Genomic Databases and Biobanking.

    PubMed

    Townend, David

    2016-03-01

    Both the European Union and the Council of Europe have a bearing on privacy in genomic databases and biobanking. In terms of legislation, the processing of personal data as it relates to the right to privacy is currently largely regulated in Europe by Directive 95/46/EC, which requires that processing be "fair and lawful" and follow a set of principles, meaning that the data be processed only for stated purposes, be sufficient for the purposes of the processing, be kept only for so long as is necessary to achieve those purposes, and be kept securely and only in an identifiable state for such time as is necessary for the processing. The European privacy regime does not require the de-identification (anonymization) of personal data used in genomic databases or biobanks, and alongside this practice informed consent as well as governance and oversight mechanisms provide for the protection of genomic data. PMID:27256129

  6. Development of genome viewer (Web Omics Viewer) for managing databases of cucumber genome

    NASA Astrophysics Data System (ADS)

    Wojcieszek, M.; RóŻ, P.; Pawełkowicz, M.; Nowak, R.; Przybecki, Z.

    Cucumber is an important plant in horticulture and science world. Sequencing projects of C. sativus genome enable new methodological aproaches in further investigation of this species. Accessibility is crucial to fully exploit obtained information about detail structure of genes, markers and other characteristic features such contigs, scaffolds and chromosomes. Genome viewer is one of tools providing plain and easy way for presenting genome data for users and for databases administration. Gbrowse - the main viewer has several very useful features but lacks in managing simplicity. Our group developed new genome browser Web Omics Viewer (WOV), keeping functionality but improving utilization and accessibility to cucumber genome data.

  7. DemaDb: an integrated dematiaceous fungal genomes database

    PubMed Central

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my PMID:26980516

  8. DemaDb: an integrated dematiaceous fungal genomes database.

    PubMed

    Kuan, Chee Sian; Yew, Su Mei; Chan, Chai Ling; Toh, Yue Fen; Lee, Kok Wei; Cheong, Wei-Hien; Yee, Wai-Yan; Hoh, Chee-Choong; Yap, Soon-Joo; Ng, Kee Peng

    2016-01-01

    Many species of dematiaceous fungi are associated with allergic reactions and potentially fatal diseases in human, especially in tropical climates. Over the past 10 years, we have isolated more than 400 dematiaceous fungi from various clinical samples. In this study, DemaDb, an integrated database was designed to support the integration and analysis of dematiaceous fungal genomes. A total of 92 072 putative genes and 6527 pathways that identified in eight dematiaceous fungi (Bipolaris papendorfii UM 226, Daldinia eschscholtzii UM 1400, D. eschscholtzii UM 1020, Pyrenochaeta unguis-hominis UM 256, Ochroconis mirabilis UM 578, Cladosporium sphaerospermum UM 843, Herpotrichiellaceae sp. UM 238 and Pleosporales sp. UM 1110) were deposited in DemaDb. DemaDb includes functional annotations for all predicted gene models in all genomes, such as Gene Ontology, EuKaryotic Orthologous Groups, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam and InterProScan. All predicted protein models were further functionally annotated to Carbohydrate-Active enzymes, peptidases, secondary metabolites and virulence factors. DemaDb Genome Browser enables users to browse and visualize entire genomes with annotation data including gene prediction, structure, orientation and custom feature tracks. The Pathway Browser based on the KEGG pathway database allows users to look into molecular interaction and reaction networks for all KEGG annotated genes. The availability of downloadable files containing assembly, nucleic acid, as well as protein data allows the direct retrieval for further downstream works. DemaDb is a useful resource for fungal research community especially those involved in genome-scale analysis, functional genomics, genetics and disease studies of dematiaceous fungi. Database URL: http://fungaldb.um.edu.my. PMID:26980516

  9. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    PubMed

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics. PMID:23299411

  10. Genome Sequence of the White Koji Mold Aspergillus kawachii IFO 4308, Used for Brewing the Japanese Distilled Spirit Shochu

    PubMed Central

    Futagami, Taiki; Mori, Kazuki; Yamashita, Ayaka; Wada, Shotaro; Kajiwara, Yasuhiro; Takashita, Hideharu; Omori, Toshiro; Takegawa, Kaoru; Tashiro, Kosuke; Kuhara, Satoru; Goto, Masatoshi

    2011-01-01

    The filamentous fungus Aspergillus kawachii has traditionally been used for brewing the Japanese distilled spirit shochu. A. kawachii characteristically hyperproduces citric acid and a variety of polysaccharide glycoside hydrolases. Here the genome sequence of A. kawachii IFO 4308 was determined and annotated. Analysis of the sequence may provide insight into the properties of this fungus that make it superior for use in shochu production, leading to the further development of A. kawachii for industrial applications. PMID:22045919

  11. The Genome Database for Rosaceae (GDR): year 10 update.

    PubMed

    Jung, Sook; Ficklin, Stephen P; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G; Mueller, Lukas A; Olmstead, Mercy A; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making. PMID:24225320

  12. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination. PMID:16381882

  13. The Genome Database for Rosaceae (GDR): year 10 update

    PubMed Central

    Jung, Sook; Ficklin, Stephen P.; Lee, Taein; Cheng, Chun-Huai; Blenda, Anna; Zheng, Ping; Yu, Jing; Bombarely, Aureliano; Cho, Ilhyung; Ru, Sushan; Evans, Kate; Peace, Cameron; Abbott, Albert G.; Mueller, Lukas A.; Olmstead, Mercy A.; Main, Dorrie

    2014-01-01

    The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making. PMID:24225320

  14. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  15. Towards Interoperability in Genome Databases: The MAtDB (MIPS Arabidopsis Thaliana Database) Experience.

    PubMed

    Schoof, Heiko

    2003-01-01

    Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced with the challenges of integrating heterogenous data and enabling data mining. In comparison to a data warehousing approach, where integration is achieved through replication of all relevant data in a unified schema, distributed approaches provide greater flexibility and maintainability. These are important in a field where new data is generated rapidly and our understanding of the data changes. Interoperability between distributed data sources allows data maintenance to be separated from integration and analysis. Simple ways to access the data can facilitate the development of new data mining tools and the transition from model genome analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data repository towards creating an integrated knowledge resource. To this end, the Arabidopsis genome has been a backbone against which to structure and integrate heterogenous data. The challenges to be met are continuous updating of data, the design of flexible data models that can evolve with new data, the integration of heterogenous data, e.g. through the use of ontologies, comprehensive views and visualization of complex information, simple interfaces for application access locally or via the Internet, and knowledge transfer across species. PMID:18629123

  16. Tripal: a construction toolkit for online genome databases.

    PubMed

    Ficklin, Stephen P; Sanderson, Lacey-Anne; Cheng, Chun-Huai; Staton, Margaret E; Lee, Taein; Cho, Il-Hyung; Jung, Sook; Bett, Kirstin E; Main, Doreen

    2011-01-01

    As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net. PMID:21959868

  17. GOLD.db: genomics of lipid-associated disorders database

    PubMed Central

    Hackl, Hubert; Maurer, Michael; Mlecnik, Bernhard; Hartler, Jürgen; Stocker, Gernot; Miranda-Saavedra, Diego; Trajanoski, Zlatko

    2004-01-01

    Background The GOLD.db (Genomics of Lipid-Associated Disorders Database) was developed to address the need for integrating disparate information on the function and properties of genes and their products that are particularly relevant to the biology, diagnosis management, treatment, and prevention of lipid-associated disorders. Description The GOLD.db provides a reference for pathways and information about the relevant genes and proteins in an efficiently organized way. The main focus was to provide biological pathways with image maps and visual pathway information for lipid metabolism and obesity-related research. This database provides also the possibility to map gene expression data individually to each pathway. Gene expression at different experimental conditions can be viewed sequentially in context of the pathway. Related large scale gene expression data sets were provided and can be searched for specific genes to integrate information regarding their expression levels in different studies and conditions. Analytic and data mining tools, reagents, protocols, references, and links to relevant genomic resources were included in the database. Finally, the usability of the database was demonstrated using an example about the regulation of Pten mRNA during adipocyte differentiation in the context of relevant pathways. Conclusions The GOLD.db will be a valuable tool that allow researchers to efficiently analyze patterns of gene expression and to display them in a variety of useful and informative ways, allowing outside researchers to perform queries pertaining to gene expression results in the context of biological processes and pathways. PMID:15588328

  18. TreeGenes: A Forest Tree Genome Database

    PubMed Central

    Wegrzyn, Jill L.; Lee, Jennifer M.; Tearse, Brandon R.; Neale, David B.

    2008-01-01

    The Dendrome Project and associated TreeGenes database serve the forest genetics research community through a curated and integrated web-based relational database. The research community is composed of approximately 2 000 members representing over 730 organizations worldwide. The database itself is composed of a wide range of genetic data from many forest trees with focused efforts on commercially important members of the Pinaceae family. The primary data types curated include species, publications, tree and DNA extraction information, genetic maps, molecular markers, ESTs, genotypic, and phenotypic data. There are currently ten main search modules or user access points within this PostgreSQL database. These access points allow users to navigate logically through the related data types. The goals of the Dendrome Project are to (1) provide a comprehensive resource for forest tree genomics data to facilitate gene discovery in related species, (2) develop interfaces that encourage the submission and integration of all genomic data, and to (3) centralize and distribute existing and novel online tools for the research community that both support and ease analysis. Recent developments have focused on increasing data content, functional annotations, data retrieval, and visualization tools. TreeGenes was developed to provide a centralized web resource with analysis and visualization tools to support data storage and exchange. PMID:18725987

  19. Data mining approaches for information retrieval from genomic databases

    NASA Astrophysics Data System (ADS)

    Liu, Donglin; Singh, Gautam B.

    2000-04-01

    Sequence retrieval in genomic databases is used for finding sequences related to a query sequence specified by a user. Comparison is the main part of the retrieval system in genomic databases. An efficient sequence comparison algorithm is critical in bioinformatics. There are several different algorithms to perform sequence comparison, such as the suffix array based database search, divergence measurement, methods that rely upon the existence of a local similarity between the query sequence and sequences in the database, or common mutual information between query and sequences in DB. In this paper we have described a new method for DNA sequence retrieval based on data mining techniques. Data mining tools generally find patterns among data and have been successfully applied in industries to improve marketing, sales, and customer support operations. We have applied the descriptive data mining techniques to find relevant patterns that are significant for comparing genetic sequences. Relevance feedback score based on common patterns is developed and employed to compute distance between sequences. The contigs of human chromosomes are used to test the retrieval accuracy and the experimental results are presented.

  20. A web-based genomic sequence database for the Streptomycetaceae: a tool for systematics and genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...

  1. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. PMID:25480115

  2. Functional Genomic Analysis of Aspergillus flavus Interacting with Resistant and Susceptible Peanut

    PubMed Central

    Wang, Houmiao; Lei, Yong; Yan, Liying; Wan, Liyun; Ren, Xiaoping; Chen, Silong; Dai, Xiaofeng; Guo, Wei; Jiang, Huifang; Liao, Boshou

    2016-01-01

    In the Aspergillus flavus (A. flavus)–peanut pathosystem, development and metabolism of the fungus directly influence aflatoxin contamination. To comprehensively understand the molecular mechanism of A. flavus interaction with peanut, RNA-seq was used for global transcriptome profiling of A. flavus during interaction with resistant and susceptible peanut genotypes. In total, 67.46 Gb of high-quality bases were generated for A. flavus-resistant (af_R) and -susceptible peanut (af_S) at one (T1), three (T2) and seven (T3) days post-inoculation. The uniquely mapped reads to A. flavus reference genome in the libraries of af_R and af_S at T2 and T3 were subjected to further analysis, with more than 72% of all obtained genes expressed in the eight libraries. Comparison of expression levels both af_R vs. af_S and T2 vs. T3 uncovered 1926 differentially expressed genes (DEGs). DEGs associated with mycelial growth, conidial development and aflatoxin biosynthesis were up-regulated in af_S compared with af_R, implying that A. flavus mycelia more easily penetrate and produce much more aflatoxin in susceptible than in resistant peanut. Our results serve as a foundation for understanding the molecular mechanisms of aflatoxin production differences between A. flavus-R and -S peanut, and offer new clues to manage aflatoxin contamination in crops. PMID:26891328

  3. Random Mutagenesis of the Aspergillus oryzae Genome Results in Fungal Antibacterial Activity.

    PubMed

    Leonard, Cory A; Brown, Stacy D; Hayman, J Russell

    2013-01-01

    Multidrug-resistant bacteria cause severe infections in hospitals and communities. Development of new drugs to combat resistant microorganisms is needed. Natural products of microbial origin are the source of most currently available antibiotics. We hypothesized that random mutagenesis of Aspergillus oryzae would result in secretion of antibacterial compounds. To address this hypothesis, we developed a screen to identify individual A. oryzae mutants that inhibit the growth of Methicillin-resistant Staphylococcus aureus (MRSA) in vitro. To randomly generate A. oryzae mutant strains, spores were treated with ethyl methanesulfonate (EMS). Over 3000 EMS-treated A. oryzae cultures were tested in the screen, and one isolate, CAL220, exhibited altered morphology and antibacterial activity. Culture supernatant from this isolate showed antibacterial activity against Methicillin-sensitive Staphylococcus aureus, MRSA, and Pseudomonas aeruginosa, but not Klebsiella pneumonia or Proteus vulgaris. The results of this study support our hypothesis and suggest that the screen used is sufficient and appropriate to detect secreted antibacterial fungal compounds resulting from mutagenesis of A. oryzae. Because the genome of A. oryzae has been sequenced and systems are available for genetic transformation of this organism, targeted as well as random mutations may be introduced to facilitate the discovery of novel antibacterial compounds using this system. PMID:23983696

  4. The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms.

    PubMed

    McGuffin, Liam J; Street, Stefano A; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2004-01-01

    Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures. PMID:14681393

  5. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome.

    PubMed

    Elsik, Christine G; Unni, Deepak R; Diesh, Colin M; Tayal, Aditi; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies. PMID:26481361

  6. Bovine Genome Database: new tools for gleaning function from the Bos taurus genome

    PubMed Central

    Elsik, Christine G.; Unni, Deepak R.; Diesh, Colin M.; Tayal, Aditi; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.

    2016-01-01

    We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies. PMID:26481361

  7. Addition of a breeding database in the Genome Database for Rosaceae.

    PubMed

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  8. Addition of a breeding database in the Genome Database for Rosaceae

    PubMed Central

    Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie

    2013-01-01

    Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will

  9. MonarchBase: the monarch butterfly genome database

    PubMed Central

    Zhan, Shuai; Reppert, Steven M.

    2013-01-01

    The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly’s migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations. PMID:23143105

  10. MonarchBase: the monarch butterfly genome database.

    PubMed

    Zhan, Shuai; Reppert, Steven M

    2013-01-01

    The monarch butterfly (Danaus plexippus) is emerging as a model organism to study the mechanisms of circadian clocks and animal navigation, and the genetic underpinnings of long-distance migration. The initial assembly of the monarch genome was released in 2011, and the biological interpretation of the genome focused on the butterfly's migration biology. To make the extensive data associated with the genome accessible to the general biological and lepidopteran communities, we established MonarchBase (available at http://monarchbase.umassmed.edu). The database is an open-access, web-available portal that integrates all available data associated with the monarch butterfly genome. Moreover, MonarchBase provides access to an updated version of genome assembly (v3) upon which all data integration is based. These include genes with systematic annotation, as well as other molecular resources, such as brain expressed sequence tags, migration expression profiles and microRNAs. MonarchBase utilizes a variety of retrieving methods to access data conveniently and for integrating biological interpretations. PMID:23143105

  11. Gene Variant Databases and Sharing: Creating a Global Genomic Variant Database for Personalized Medicine.

    PubMed

    Bean, Lora J H; Hegde, Madhuri R

    2016-06-01

    Revolutionary changes in sequencing technology and the desire to develop therapeutics for rare diseases have led to the generation of an enormous amount of genomic data in the last 5 years. Large-scale sequencing done in both research and diagnostic laboratories has linked many new genes to rare diseases, but has also generated a number of variants that we cannot interpret today. It is clear that we remain a long way from a complete understanding of the genomic variation in the human genome and its association with human health and disease. Recent studies identified susceptibility markers to infectious diseases and also the contribution of rare variants to complex diseases in different populations. The sequencing revolution has also led to the creation of a large number of databases that act as "keepers" of data, and in many cases give an interpretation of the effect of the variant. This interpretation is based on reports in the literature, prediction models, and in some cases is accompanied by functional evidence. As we move toward the practice of genomic medicine, and consider its place in "personalized medicine," it is time to ask ourselves how we can aggregate this wealth of data into a single database for multiple users with different goals. PMID:26931283

  12. GOBASE—a database of organelle and bacterial genome information

    PubMed Central

    O'Brien, Emmet A.; Zhang, Yue; Yang, LiuSong; Wang, Eric; Marie, Veronique; Lang, B. Franz; Burger, Gertraud

    2006-01-01

    The organelle genome database GOBASE is now in its twelfth release, and includes 350 000 mitochondrial sequences and 118 000 chloroplast sequences, roughly a 3-fold expansion since previously documented. GOBASE also includes a fully reannotated genome sequence of Rickettsia prowazekii, one of the closest bacterial relatives of mitochondria, and will shortly expand to contain more data from bacteria from which organelles originated. All these sequences are now accessible through a single unified interface. Enhancements to the functionality of GOBASE include addition of pages for RNA structures and a page compiling data about the taxonomic distribution of organelle-encoded genes; incorporation of Gene Ontology terms; addition of features deduced from incomplete annotations to sequences in GenBank; marking of type examples in cases where single genes in single species are oversampled within GenBank; and addition of graphics illustrating gene structure and the position of neighbouring genes on a sequence. The database has been reimplemented in PostgreSQL to facilitate development and maintenance, and structural modifications have been made to speed up queries, particularly those related to taxonomy. The GOBASE database can be queried at and inquiries should be directed to gobase@bch.umontreal.ca. PMID:16381962

  13. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. PMID:25404132

  14. DoGSD: the dog and wolf genome SNP database

    PubMed Central

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M.; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. PMID:25404132

  15. The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology

    PubMed Central

    Wang, Dapeng; Xia, Yan; Li, Xinna; Hou, Lixia; Yu, Jun

    2013-01-01

    Over the past 10 years, genomes of cultivated rice cultivars and their wild counterparts have been sequenced although most efforts are focused on genome assembly and annotation of two major cultivated rice (Oryza sativa L.) subspecies, 93-11 (indica) and Nipponbare (japonica). To integrate information from genome assemblies and annotations for better analysis and application, we now introduce a comparative rice genome database, the Rice Genome Knowledgebase (RGKbase, http://rgkbase.big.ac.cn/RGKbase/). RGKbase is built to have three major components: (i) integrated data curation for rice genomics and molecular biology, which includes genome sequence assemblies, transcriptomic and epigenomic data, genetic variations, quantitative trait loci (QTLs) and the relevant literature; (ii) User-friendly viewers, such as Gbrowse, GeneBrowse and Circos, for genome annotations and evolutionary dynamics and (iii) Bioinformatic tools for compositional and synteny analyses, gene family classifications, gene ontology terms and pathways and gene co-expression networks. RGKbase current includes data from five rice cultivars and species: Nipponbare (japonica), 93-11 (indica), PA64s (indica), the African rice (Oryza glaberrima) and a wild rice species (Oryza brachyantha). We are also constantly introducing new datasets from variety of public efforts, such as two recent releases—sequence data from ∼1000 rice varieties, which are mapped into the reference genome, yielding ample high-quality single-nucleotide polymorphisms and insertions–deletions. PMID:23193278

  16. Tetrahymena functional genomics database (TetraFGD): an integrated resource for Tetrahymena functional genomics.

    PubMed

    Xiong, Jie; Lu, Yuming; Feng, Jinmei; Yuan, Dongxia; Tian, Miao; Chang, Yue; Fu, Chengjie; Wang, Guangying; Zeng, Honghui; Miao, Wei

    2013-01-01

    The ciliated protozoan Tetrahymena thermophila is a useful unicellular model organism for studies of eukaryotic cellular and molecular biology. Researches on T. thermophila have contributed to a series of remarkable basic biological principles. After the macronuclear genome was sequenced, substantial progress has been made in functional genomics research on T. thermophila, including genome-wide microarray analysis of the T. thermophila life cycle, a T. thermophila gene network analysis based on the microarray data and transcriptome analysis by deep RNA sequencing. To meet the growing demands for the Tetrahymena research community, we integrated these data to provide a public access database: Tetrahymena functional genomics database (TetraFGD). TetraFGD contains three major resources, including the RNA-Seq transcriptome, microarray and gene networks. The RNA-Seq data define gene structures and transcriptome, with special emphasis on exon-intron boundaries; the microarray data describe gene expression of 20 time points during three major stages of the T. thermophila life cycle; the gene network data identify potential gene-gene interactions of 15 049 genes. The TetraFGD provides user-friendly search functions that assist researchers in accessing gene models, transcripts, gene expression data and gene-gene relationships. In conclusion, the TetraFGD is an important functional genomic resource for researchers who focus on the Tetrahymena or other ciliates. Database URL: http://tfgd.ihb.ac.cn/ PMID:23482072

  17. The YeastGenome app: the Saccharomyces Genome Database at your fingertips.

    PubMed

    Wong, Edith D; Karra, Kalpana; Hitz, Benjamin C; Hong, Eurie L; Cherry, J Michael

    2013-01-01

    The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD's mission to provide free and open access to all its data and annotations. PMID:23396302

  18. The YeastGenome app: the Saccharomyces Genome Database at your fingertips

    PubMed Central

    Wong, Edith D.; Karra, Kalpana; Hitz, Benjamin C.; Hong, Eurie L.; Cherry, J. Michael

    2013-01-01

    The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD’s mission to provide free and open access to all its data and annotations. PMID:23396302

  19. Dual genome microarray: Fusarium verticillioides and Aspergillus flavus gene expression in co-culture

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins produced by Aspergillus flavus, and fumonisins produced by Fusarium verticillioides, are prominent among the mycotoxins associated with economic losses to the maize grain industry worldwide. F. verticillioides is also recognized as a systemic endophyte of maize that prevents opportunisti...

  20. Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics.

    PubMed

    Fei, Zhangjun; Joung, Je-Gun; Tang, Xuemei; Zheng, Yi; Huang, Mingyun; Lee, Je Min; McQuinn, Ryan; Tieman, Denise M; Alba, Rob; Klee, Harry J; Giovannoni, James J

    2011-01-01

    Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu. PMID:20965973

  1. Plant Genome DataBase Japan (PGDBj): A Portal Website for the Integration of Plant Genome-Related Databases

    PubMed Central

    Asamizu, Erika; Ichihara, Hisako; Nakaya, Akihiro; Nakamura, Yasukazu; Hirakawa, Hideki; Ishii, Takahiro; Tamura, Takuro; Fukami-Kobayashi, Kaoru; Nakajima, Yukari; Tabata, Satoshi

    2014-01-01

    The Plant Genome DataBase Japan (PGDBj, http://pgdbj.jp/?ln=en) is a portal website that aims to integrate plant genome-related information from databases (DBs) and the literature. The PGDBj is comprised of three component DBs and a cross-search engine, which provides a seamless search over the contents of the DBs. The three DBs are as follows. (i) The Ortholog DB, providing gene cluster information based on the amino acid sequence similarity. Over 500,000 amino acid sequences of 20 Viridiplantae species were subjected to reciprocal BLAST searches and clustered. Sequences from plant genome DBs (e.g. TAIR10 and RAP-DB) were also included in the cluster with a direct link to the original DB. (ii) The Plant Resource DB, integrating the SABRE DB, which provides cDNA and genome sequence resources accumulated and maintained in the RIKEN BioResource Center and National BioResource Projects. (iii) The DNA Marker DB, providing manually or automatically curated information of DNA markers, quantitative trait loci and related linkage maps, from the literature and external DBs. As the PGDBj targets various plant species, including model plants, algae, and crops important as food, fodder and biofuel, researchers in the field of basic biology as well as a wide range of agronomic fields are encouraged to perform searches using DNA sequences, gene names, traits and phenotypes of interest. The PGDBj will return the search results from the component DBs and various types of linked external DBs. PMID:24363285

  2. fPoxDB: fungal peroxidase database for comparative genomics

    PubMed Central

    2014-01-01

    Background Peroxidases are a group of oxidoreductases which mediate electron transfer from hydrogen peroxide (H2O2) and organic peroxide to various electron acceptors. They possess a broad spectrum of impact on industry and fungal biology. There are numerous industrial applications using peroxidases, such as to catalyse highly reactive pollutants and to breakdown lignin for recycling of carbon sources. Moreover, genes encoding peroxidases play important roles in fungal pathogenicity in both humans and plants. For better understanding of fungal peroxidases at the genome-level, a novel genomics platform is required. To this end, Fungal Peroxidase Database (fPoxDB; http://peroxidase.riceblast.snu.ac.kr/) has been developed to provide such a genomics platform for this important gene family. Description In order to identify and classify fungal peroxidases, 24 sequence profiles were built and applied on 331 genomes including 216 from fungi and Oomycetes. In addition, NoxR, which is known to regulate NADPH oxidases (NoxA and NoxB) in fungi, was also added to the pipeline. Collectively, 6,113 genes were predicted to encode 25 gene families, presenting well-separated distribution along the taxonomy. For instance, the genes encoding lignin peroxidase, manganese peroxidase, and versatile peroxidase were concentrated in the rot-causing basidiomycetes, reflecting their ligninolytic capability. As a genomics platform, fPoxDB provides diverse analysis resources, such as gene family predictions based on fungal sequence profiles, pre-computed results of eight bioinformatics programs, similarity search tools, a multiple sequence alignment tool, domain analysis functions, and taxonomic distribution summary, some of which are not available in the previously developed peroxidase resource. In addition, fPoxDB is interconnected with other family web systems, providing extended analysis opportunities. Conclusions fPoxDB is a fungi-oriented genomics platform for peroxidases. The sequence

  3. The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease

    PubMed Central

    Shimoyama, Mary; De Pons, Jeff; Hayman, G. Thomas; Laulederkind, Stanley J.F.; Liu, Weisong; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Tutaj, Marek; Wang, Shur-Jen; Worthey, Elizabeth; Dwinell, Melinda; Jacob, Howard

    2015-01-01

    The Rat Genome Database (RGD, http://rgd.mcw.edu) provides the most comprehensive data repository and informatics platform related to the laboratory rat, one of the most important model organisms for disease studies. RGD maintains and updates datasets for genomic elements such as genes, transcripts and increasingly in recent years, sequence variations, as well as map positions for multiple assemblies and sequence information. Functional annotations for genomic elements are curated from published literature, submitted by researchers and integrated from other public resources. Complementing the genomic data catalogs are those associated with phenotypes and disease, including strains, QTL and experimental phenotype measurements across hundreds of strains. Data are submitted by researchers, acquired through bulk data pipelines or curated from published literature. Innovative software tools provide users with an integrated platform to query, mine, display and analyze valuable genomic and phenomic datasets for discovery and enhancement of their own research. This update highlights recent developments that reflect an increasing focus on: (i) genomic variation, (ii) phenotypes and diseases, (iii) data related to the environment and experimental conditions and (iv) datasets and software tools that allow the user to explore and analyze the interactions among these and their impact on disease. PMID:25355511

  4. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der Lawrence Berkeley Lab., CA )

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  5. Sequence modelling and an extensible data model for genomic database

    SciTech Connect

    Li, Peter Wei-Der |

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  6. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-01

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided. PMID:26631124

  7. The Saccharomyces Genome Database: Exploring Biochemical Pathways and Mutant Phenotypes.

    PubMed

    Cherry, J Michael

    2015-12-01

    Many biochemical processes, and the proteins and cofactors involved, have been defined for the eukaryote Saccharomyces cerevisiae. This understanding has been largely derived through the awesome power of yeast genetics. The proteins responsible for the reactions that build complex molecules and generate energy for the cell have been integrated into web-based tools that provide classical views of pathways. The Yeast Pathways in the Saccharomyces Genome Database (SGD) is, however, the only database created from manually curated literature annotations. In this protocol, gene function is explored using phenotype annotations to enable hypotheses to be formulated about a gene's action. A common use of the SGD is to understand more about a gene that was identified via a phenotypic screen or found to interact with a gene/protein of interest. There are still many genes that do not yet have an experimentally defined function and so the information currently available can be used to speculate about their potential function. Typically, computational annotations based on sequence similarity are used to predict gene function. In addition, annotations are sometimes available for phenotypes of mutations in the gene of interest. Integrated results for a few example genes will be explored in this protocol. This will be instructive for the exploration of details that aid the analysis of experimental results and the establishment of connections within the yeast literature. PMID:26631123

  8. The release of the Aspergillus flavus whole genome sequence for public access

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are produced by Aspergillus flavus and A. parasiticus. These toxic and carcinogenic compounds contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed, Aspergilllus flavus wh...

  9. Aspergillus flavus whole genome and EST sequence releases and construction of homologous gene search blast server

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxins are toxic and carcinogenic secondary metabolites. These compounds, produced by Aspergillus flavus and A. parasiticus, contaminate pre-harvest agricultural crops in the field and post-harvest grains during storage. In order to reduce and eliminate aflatoxin contamination of food and feed...

  10. Genome wide association mapping of Aspergillus flavus and aflatoxin accumulation resistance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Contamination of maize with aflatoxin, produced by the fungus Aspergillus flavus, has severe health and economic consequences. Efforts to reduce aflatoxin accumulation in maize have focused on identifying and selecting germplasm with natural host resistance factors, and several maize lines with sign...

  11. Aspergillus flavus Genomics as a Tool for Studying the Mechanism of Aflatoxin Formation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a pathogen that infects plants, animals, and humans. It produces the most potent carcinogens, known as aflatoxins, when it infects agricultural crops. In order to devise strategies to control aflatoxin contamination of pre-harvest agricultural crops and post harvest grains du...

  12. Aspergillus flavus Genomic Data Mining Provides Clues for Its Use in Producing Biobased Products

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is notorious for its ability to produce aflatoxins. It is also an opportunistic pathogen that infects plants, animals and human beings. The ability to survive in the natural environment, living on plant tissues (leaves or stalks), live or dead insects make A. flavus a ubiquitous...

  13. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  14. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. PMID:25398900

  15. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    DOE Data Explorer

    Loots, Gabriela G. [LLNL; Ovcharenko, I. [LLNL

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. This database of evolutionary conserved regions (ECRs) in vertebrate genomes features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites (TFBSs) in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and fugu genomes. (taken from paper in Journal: Bioinformatics, November 7, 2006, pp. 122-124

  16. ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes

    SciTech Connect

    Loots, G; Ovcharenko, I

    2006-08-08

    Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions (ECRs) in vertebrate genomes entitled ECRbase that is constructed from a collection of pairwise vertebrate genome alignments produced by the ECR Browser database. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a collection of promoters in all vertebrate genomes presented in the database. The database also contains a collection of annotated transcription factor binding sites (TFBS) in all ECRs and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and two pufferfish genomes. It is freely accessible at http://ECRbase.dcode.org.

  17. MELOGEN: an EST database for melon functional genomics

    PubMed Central

    Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

    2007-01-01

    Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes

  18. Draft genome sequences of two closely-related aflatoxigenic Aspergillus species obtained from the Ivory Coast

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...

  19. Genome-scale analysis of the high-efficient protein secretion system of Aspergillus oryzae

    PubMed Central

    2014-01-01

    Background The koji mold, Aspergillus oryzae is widely used for the production of industrial enzymes due to its particularly high protein secretion capacity and ability to perform post-translational modifications. However, systemic analysis of its secretion system is lacking, generally due to the poorly annotated proteome. Results Here we defined a functional protein secretory component list of A. oryzae using a previously reported secretory model of S. cerevisiae as scaffold. Additional secretory components were obtained by blast search with the functional components reported in other closely related fungal species such as Aspergillus nidulans and Aspergillus niger. To evaluate the defined component list, we performed transcriptome analysis on three α-amylase over-producing strains with varying levels of secretion capacities. Specifically, secretory components involved in the ER-associated processes (including components involved in the regulation of transport between ER and Golgi) were significantly up-regulated, with many of them never been identified for A. oryzae before. Furthermore, we defined a complete list of the putative A. oryzae secretome and monitored how it was affected by overproducing amylase. Conclusion In combination with the transcriptome data, the most complete secretory component list and the putative secretome, we improved the systemic understanding of the secretory machinery of A. oryzae in response to high levels of protein secretion. The roles of many newly predicted secretory components were experimentally validated and the enriched component list provides a better platform for driving more mechanistic studies of the protein secretory pathway in this industrially important fungus. PMID:24961398

  20. Purification and enzymatic characterization of secretory glycoside hydrolase family 3 (GH3) aryl β-glucosidases screened from Aspergillus oryzae genome.

    PubMed

    Kudo, Kanako; Watanabe, Akira; Ujiie, Seiryu; Shintani, Takahiro; Gomi, Katsuya

    2015-12-01

    By a global search of the genome database of Aspergillus oryzae, we found 23 genes encoding putative β-glucosidases, among which 10 genes with a signal peptide belonging to glycoside hydrolase family 3 (GH3) were overexpressed in A. oryzae using the improved glaA gene promoter. Consequently, crude enzyme preparations from three strains, each harboring the genes AO090038000223 (bglA), AO090103000127 (bglF), and AO090003001511 (bglJ), showed a substrate preference toward p-nitrophenyl-β-d-glucopyranoside (pNPGlc) and thus were purified to homogeneity and enzymatically characterized. All the purified enzymes (BglA, BglF, and BglJ) preferentially hydrolyzed aryl β-glycosides, including pNPGlc, rather than cellobiose, and these enzymes were proven to be aryl β-glucosidases. Although the specific activity of BglF toward all the substrates tested was significantly low, BglA and BglJ showed appreciably high activities toward pNPGlc and arbutin. The kinetic parameters of BglA and BglJ for pNPGlc suggested that both the enzymes had relatively higher hydrolytic activity toward pNPGlc among the fungal β-glucosidases reported. The thermal and pH stabilities of BglA were higher than those of BglJ, and BglA was particularly stable in a wide pH range (pH 4.5-10). In contrast, BglJ was the most heat- and alkaline-labile among the three β-glucosidases. Furthermore, BglA was more tolerant to ethanol than BglJ; as a result, it showed much higher hydrolytic activity toward isoflavone glycosides in the presence of ethanol than BglJ. This study suggested that the mining of novel β-glucosidases exhibiting higher activity from microbial genome sequences is of great use for the production of beneficial compounds such as isoflavone aglycones. PMID:25936960

  1. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources

    PubMed Central

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/ PMID:26589635

  2. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    PubMed

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. PMID:26589635

  3. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter; van de Vondervoort, Peter; Culley, David E.; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristian F.; Albang, Richard; Albermann, Kaj; Berka, Randy; Braus, Gerhard; Braus-Stromeyer, Susanna A.; Corrochano, Luis; Dai, Ziyu; van Dijck, Piet; Hofmann, Gerald; Lasure, Linda L.; Magnuson, Jon K.; Menke, Hildegard; Meijer, Martin; Meijer, Susan; Nielsen, Jakob B.; Nielsen, Michael L.; van Ooyen, Albert; Pel, Herman J.; Poulsen, Lars; Samson, Rob; Stam, Hein; Tsang, Adrian; van den Brink, Johannes M.; ATkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Grigoriev, Igor V.; Kubicek, Christian P.; Martinez, Diego; van Peij, Noel; Roubos, Johannes A.; Nielsen, Jens B.; Baker, Scott E.

    2011-06-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.

  4. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. PMID:23826978

  5. Biosynthetic Pathway for the Epipolythiodioxopiperazine Acetylaranotin in Aspergillus terreus Revealed by Genome-based Deletion Analysis

    SciTech Connect

    Guo, Chun-Jun; Yeh, Hsu-Hua; Chiang, Yi Ming; Sanchez, James F.; Chang, ShuLin; Bruno, Kenneth S.; Wang, Clay C.

    2013-04-15

    Abstract Epipolythiodioxopiperazines (ETPs) are a class of fungal secondary metabolites derived from cyclic peptides. Acetylaranotin belongs to one structural subgroup of ETPs characterized by the presence of a seven-membered dihydrooxepine ring. Defining the genes involved in acetylaranotin biosynthesis should provide a means to increase production of these compounds and facilitate the engineering of second-generation molecules. The filamentous fungus Aspergillus terreus produces acetylaranotin and related natural products. Using targeted gene deletions, we have identified a cluster of 9 genes including one nonribosomal peptide synthase gene, ataP, that is required for acetylaranotin biosynthesis. Chemical analysis of the wild type and mutant strains enabled us to isolate seventeen natural products that are either intermediates in the normal biosynthetic pathway or shunt products that are produced when the pathway is interrupted through mutation. Nine of the compounds identified in this study are novel natural products. Our data allow us to propose a complete biosynthetic pathway for acetylaranotin and related natural products.

  6. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    SciTech Connect

    Grigoriev, Igor V.; Baker, Scott E.; Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; Vondervoot, Peter J.I. van de; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristen F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; Dijck, Piet W.M. van; Hofmann, Gerald; Lasure, Linda L.; Magnusson, Jon K.; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; Ooyen, Albert J.J. van; Panther, Kathyrn S.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hen; Tsang, Adrian; Brink, Johannes M. van den; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Kubicek, Christian P.; Martinez, Diego; Peij, Noel N.M.E. van; Roubos, Johannes A.; Nielsen, Jens

    2011-04-28

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up-regulation of genes relevant to glucoamylase A production, such as tRNA-synthases and protein transporters. Our results and datasets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.[Supplemental materials (10 figures, three text documents and 16 tables) have been made available

  7. Use of Genomic Databases for Inquiry-Based Learning about Influenza

    ERIC Educational Resources Information Center

    Ledley, Fred; Ndung'u, Eric

    2011-01-01

    The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…

  8. The MaizeGDB Genome Browser Tutorial: One example of database outreach to biologists via video

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At the Maize Genetics and Genomics Database (MaizeGDB), we have developed a number of video tutorials that aim to demonstrate how to use various tools as well as to explici...

  9. CottonGen: a genomics, genetics and breeding database for cotton research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, vis...

  10. Characterization of the Mutagenic Spectrum of 4-Nitroquinoline 1-Oxide (4-NQO) in Aspergillus nidulans by Whole Genome Sequencing

    PubMed Central

    Downes, Damien J.; Chonofsky, Mark; Tan, Kaeling; Pfannenstiel, Brandon T.; Reck-Peterson, Samara L.; Todd, Richard B.

    2014-01-01

    4-Nitroquinoline 1-oxide (4-NQO) is a highly carcinogenic chemical that induces mutations in bacteria, fungi, and animals through the formation of bulky purine adducts. 4-NQO has been used as a mutagen for genetic screens and in both the study of DNA damage and DNA repair. In the model eukaryote Aspergillus nidulans, 4-NQO−based genetic screens have been used to study diverse processes, including gene regulation, mitosis, metabolism, organelle transport, and septation. Early work during the 1970s using bacterial and yeast mutation tester strains concluded that 4-NQO was a guanine-specific mutagen. However, these strains were limited in their ability to determine full mutagenic potential, as they could not identify mutations at multiple sites, unlinked suppressor mutations, or G:C to C:G transversions. We have now used a whole genome resequencing approach with mutant strains generated from two independent genetic screens to determine the full mutagenic spectrum of 4-NQO in A. nidulans. Analysis of 3994 mutations from 38 mutant strains reveals that 4-NQO induces substitutions in both guanine and adenine residues, although with a 19-fold preference for guanine. We found no association between mutation load and mutagen dose and observed no sequence bias in the residues flanking the mutated purine base. The mutations were distributed randomly throughout most of the genome. Our data provide new evidence that 4-NQO can potentially target all base pairs. Furthermore, we predict that current practices for 4-NQO−induced mutagenesis are sufficient to reach gene saturation for genetic screens with feasible identification of causative mutations via whole genome resequencing. PMID:25352541

  11. SoyBase, The USDA-ARS Soybean Genetics and Genomics Database

    Technology Transfer Automated Retrieval System (TEKTRAN)

    SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The...

  12. CURRENT STATUS OF THE COTTON DB, A GENOME DATABASE FOR GOSSYPIUM SPP

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cotton genome database, CottonDB, is a publicly available resource for cotton genome research ranging from the collections of Gossypium germplasm, molecular markers, to the functions of cotton genes. Curation of CottonDB is currently maintained in our Research Unit (http://algodon.tamu.edu/cotto...

  13. CURRENT STATUS OF THE COTTONDB, A GENOME DATABASE FOR GOSSYPIUM SPP

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cotton genome database, CottonDB, is a publicly available resource for cotton genome research ranging from the collections of Gossypium germplasm, molecular markers, to the functions of cotton genes. Curation of CottonDB is currently maintained in our Research Unit(http://algodon.tamu.edu/cotton...

  14. The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video

    PubMed Central

    Harper, Lisa C.; Schaeffer, Mary L.; Thistle, Jordan; Gardiner, Jack M.; Andorf, Carson M.; Campbell, Darwin A.; Cannon, Ethalinda K.S.; Braun, Bremen L.; Birkett, Scott M.; Lawrence, Carolyn J.; Sen, Taner Z.

    2011-01-01

    Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At MaizeGDB, we have developed a number of video tutorials that demonstrate how to use various tools and explicitly outline the caveats researchers should know to interpret the information available to them. One such popular video currently available is ‘Using the MaizeGDB Genome Browser’, which describes how the maize genome was sequenced and assembled as well as how the sequence can be visualized and interacted with via the MaizeGDB Genome Browser. Database URL: http://www.maizegdb.org/ PMID:21565781

  15. Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals.

    PubMed

    Lee, Yong Seok; Oh, Jeongsu; Kim, Young Uk; Kim, Namchul; Yang, Sungjin; Hwang, Ui Wook

    2008-01-01

    Mitome is a specialized mitochondrial genome database designed for easy comparative analysis of various features of metazoan mitochondrial genomes such as base frequency, A+T skew, codon usage and gene arrangement pattern. A particular function of the database is the automatic reconstruction of phylogenetic relationships among metazoans selected by a user from a taxonomic tree menu based on nucleotide sequences, amino acid sequences or gene arrangement patterns. Mitome also enables us (i) to easily find the taxonomic positions of organisms of which complete mitochondrial genome sequences are publicly available; (ii) to acquire various metazoan mitochondrial genome characteristics through a graphical genome browser; (iii) to search for homology patterns in mitochondrial gene arrangements; (iv) to download nucleotide or amino acid sequences not only of an entire mitochondrial genome but also of each component; and (v) to find interesting references easily through links with PubMed. In order to provide users with a dynamic, responsive, interactive and faster web database, Mitome is constructed using two recently highlighted techniques, Ajax (Asynchronous JavaScript and XML) and Web Services. Mitome has the potential to become very useful in the fields of molecular phylogenetics and evolution and comparative organelle genomics. The database is available at: http://www.mitome.info. PMID:17940090

  16. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  17. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  18. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    PubMed Central

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  19. Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

    PubMed

    Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio

    2015-01-01

    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254

  20. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    SciTech Connect

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  1. Databases in SenseLab for the Genomics, Protemics, and Function of Olfactory Receptors

    PubMed Central

    Marenco, Luis N.; Bahl, Gautam; Hyland, Lorra; Shi, Jing; Wang, Rixin; Lai, Peter C.; Miller, Perry L.; Shepherd, Gordon M.; Crasto, Chiquito J.

    2013-01-01

    We present here, the salient aspects of three databases: Olfactory Receptor Database (ORDB) is a repository of genomics and proteomics information of ORs; OdorDB stores information related to odorous compounds, specifically identitying those that have been shown to interact with olfactory rectors; and OdorModelDB disseminates information related to computational models of olfactory receptors (ORs). The data stored among these databases is integrated. Presented in this chapter are descriptions of these resources, which are part of the SenseLab suite of databases, a discussion of the computational infrastructure that enhances the efficacy of information storage, retrieval, dissemination, and automated data population from external sources. PMID:23585030

  2. Databases and web tools for cancer genomics study.

    PubMed

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591

  3. Databases and Web Tools for Cancer Genomics Study

    PubMed Central

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-01-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. PMID:25707591

  4. PGSB PlantsDB: updates to the database framework for comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C.; Martis, Mihaela M.; Seidel, Michael; Kugler, Karl G.; Gundlach, Heidrun; Mayer, Klaus F.X.

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  5. PGSB PlantsDB: updates to the database framework for comparative plant genome research.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  6. Databases and information integration for the Medicago truncatula genome and transcriptome.

    PubMed

    Cannon, Steven B; Crow, John A; Heuer, Michael L; Wang, Xiaohong; Cannon, Ethalinda K S; Dwan, Christopher; Lamblin, Anne-Francoise; Vasdewani, Jayprakash; Mudge, Joann; Cook, Andrew; Gish, John; Cheung, Foo; Kenton, Steve; Kunau, Timothy M; Brown, Douglas; May, Gregory D; Kim, Dongjin; Cook, Douglas R; Roe, Bruce A; Town, Chris D; Young, Nevin D; Retzel, Ernest F

    2005-05-01

    An international consortium is sequencing the euchromatic genespace of Medicago truncatula. Extensive bioinformatic and database resources support the marker-anchored bacterial artificial chromosome (BAC) sequencing strategy. Existing physical and genetic maps and deep BAC-end sequencing help to guide the sequencing effort, while EST databases provide essential resources for genome annotation as well as transcriptome characterization and microarray design. Finished BAC sequences are joined into overlapping sequence assemblies and undergo an automated annotation process that integrates ab initio predictions with EST, protein, and other recognizable features. Because of the sequencing project's international and collaborative nature, data production, storage, and visualization tools are broadly distributed. This paper describes databases and Web resources for the project, which provide support for physical and genetic maps, genome sequence assembly, gene prediction, and integration of EST data. A central project Web site at medicago.org/genome provides access to genome viewers and other resources project-wide, including an Ensembl implementation at medicago.org, physical map and marker resources at mtgenome.ucdavis.edu, and genome viewers at the University of Oklahoma (www.genome.ou.edu), the Institute for Genomic Research (www.tigr.org), and Munich Information for Protein Sequences Center (mips.gsf.de). PMID:15888676

  7. A Ruby API to query the Ensembl database for genomic features

    PubMed Central

    Strozzi, Francesco; Aerts, Jan

    2011-01-01

    Summary: The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Availability and Implementation: Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api. Contact: jan.aerts@esat.kuleuven.be PMID:21278190

  8. SNP-Seek database of SNPs derived from 3000 rice genomes.

    PubMed

    Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L

    2015-01-01

    We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

  9. diArk – the database for eukaryotic genome and transcriptome assemblies in 2014

    PubMed Central

    Kollmar, Martin; Kollmar, Lotte; Hammesfahr, Björn; Simm, Dominic

    2015-01-01

    Eukaryotic genomes are the basis for understanding the complexity of life from populations to the molecular level. Recent technological innovations have revolutionized the speed of data generation enabling the sequencing of eukaryotic genomes and transcriptomes within days. The database diArk (http://www.diark.org) has been developed with the aim to provide access to all available assembled genomes and transcriptomes. In September 2014, diArk contains about 2600 eukaryotes with 6000 genome and transcriptome assemblies, of which 22% are not available via NCBI/ENA/DDBJ. Several indicators for the quality of the assemblies are provided to facilitate their comparison for selecting the most appropriate dataset for further studies. diArk has a user-friendly web interface with extensive options for filtering and browsing the sequenced eukaryotes. In this new version of the database we have also integrated species, for which transcriptome assemblies are available, and we provide more analyses of assemblies. PMID:25378341

  10. VibrioBase: a model for next-generation genome and annotation database development.

    PubMed

    Choo, Siew Woh; Heydari, Hamed; Tan, Tze King; Siow, Cheuk Chuen; Beh, Ching Yew; Wee, Wei Yee; Mutha, Naresh V R; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam

    2014-01-01

    To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development. PMID:25243218

  11. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development

    PubMed Central

    Choo, Siew Woh; Tan, Tze King; Mutha, Naresh V. R.; Wong, Guat Jah

    2014-01-01

    To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development. PMID:25243218

  12. Aspergillus: introduction

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Species in the genus Aspergillus possess versatile metabolic activities that impact our daily life both positively and negatively. Aspergillus flavus and Aspergillus oryzae are closely related fungi. While the former is able to produce carcinogenic aflatoxins and is an etiological agent of aspergill...

  13. Tomato genomic resources database: an integrated repository of useful tomato genomic information for basic and applied research.

    PubMed

    Suresh, B Venkata; Roy, Riti; Sahu, Kamlesh; Misra, Gopal; Chattopadhyay, Debasis

    2014-01-01

    Tomato Genomic Resources Database (TGRD) allows interactive browsing of tomato genes, micro RNAs, simple sequence repeats (SSRs), important quantitative trait loci and Tomato-EXPEN 2000 genetic map altogether or separately along twelve chromosomes of tomato in a single window. The database is created using sequence of the cultivar Heinz 1706. High quality single nucleotide polymorphic (SNP) sites between the genes of Heinz 1706 and the wild tomato S. pimpinellifolium LA1589 are also included. Genes are classified into different families. 5'-upstream sequences (5'-US) of all the genes and their tissue-specific expression profiles are provided. Sequences of the microRNA loci and their putative target genes are catalogued. Genes and 5'-US show presence of SSRs and SNPs. SSRs located in the genomic, genic and 5'-US can be analysed separately for the presence of any particular motif. Primer sequences for all the SSRs and flanking sequences for all the genic SNPs have been provided. TGRD is a user-friendly web-accessible relational database and uses CMAP viewer for graphical scanning of all the features. Integration and graphical presentation of important genomic information will facilitate better and easier use of tomato genome. TGRD can be accessed as an open source repository at http://59.163.192.91/tomato2/. PMID:24466070

  14. Tomato Genomic Resources Database: An Integrated Repository of Useful Tomato Genomic Information for Basic and Applied Research

    PubMed Central

    Suresh, B. Venkata; Roy, Riti; Sahu, Kamlesh; Misra, Gopal; Chattopadhyay, Debasis

    2014-01-01

    Tomato Genomic Resources Database (TGRD) allows interactive browsing of tomato genes, micro RNAs, simple sequence repeats (SSRs), important quantitative trait loci and Tomato-EXPEN 2000 genetic map altogether or separately along twelve chromosomes of tomato in a single window. The database is created using sequence of the cultivar Heinz 1706. High quality single nucleotide polymorphic (SNP) sites between the genes of Heinz 1706 and the wild tomato S. pimpinellifolium LA1589 are also included. Genes are classified into different families. 5′-upstream sequences (5′-US) of all the genes and their tissue-specific expression profiles are provided. Sequences of the microRNA loci and their putative target genes are catalogued. Genes and 5′-US show presence of SSRs and SNPs. SSRs located in the genomic, genic and 5′-US can be analysed separately for the presence of any particular motif. Primer sequences for all the SSRs and flanking sequences for all the genic SNPs have been provided. TGRD is a user-friendly web-accessible relational database and uses CMAP viewer for graphical scanning of all the features. Integration and graphical presentation of important genomic information will facilitate better and easier use of tomato genome. TGRD can be accessed as an open source repository at http://59.163.192.91/tomato2/. PMID:24466070

  15. coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics

    PubMed Central

    Chaudhuri, Roy R.; Khan, Arshad M.; Pallen, Mark J.

    2004-01-01

    We have constructed coliBASE, a database for Escherichia coli, Shigella and Salmonella comparative genomics available online at http://colibase.bham.ac.uk. Unlike other E.coli databases, which focus on the laboratory model strain K12, coliBASE is intended to reflect the full diversity of E.coli and its relatives. The database contains comparative data including whole genome alignments and lists of putative orthologous genes, together with numerous analytical tools and links to existing online resources. The data are stored in a relational database, accessible by a number of user-friendly search methods and graphical browsers. The database schema is generic and can easily be applied to other bacterial genomes. Two such databases, CampyDB (for the analysis of Campylobacter spp.) and ClostriDB (for Clostridium spp.) are also available at http://campy.bham.ac.uk and http://clostri.bham.ac.uk, respectively. An example of the power of E.coli comparative analyses such as those available through coliBASE is presented. PMID:14681417

  16. A novel mycovirus from Aspergillus fumigatus contains four unique dsRNAs as its genome and is infectious as dsRNA

    PubMed Central

    Kanhayuwa, Lakkhana; Kotta-Loizou, Ioly; Özkan, Selin; Gunning, A. Patrick; Coutts, Robert H. A.

    2015-01-01

    We report the discovery and characterization of a double-stranded RNA (dsRNA) mycovirus isolated from the human pathogenic fungus Aspergillus fumigatus, Aspergillus fumigatus tetramycovirus-1 (AfuTmV-1), which reveals several unique features not found previously in positive-strand RNA viruses, including the fact that it represents the first dsRNA (to our knowledge) that is not only infectious as a purified entity but also as a naked dsRNA. The AfuTmV-1 genome consists of four capped dsRNAs, the largest of which encodes an RNA-dependent RNA polymerase (RdRP) containing a unique GDNQ motif normally characteristic of negative-strand RNA viruses. The third largest dsRNA encodes an S-adenosyl methionine–dependent methyltransferase capping enzyme and the smallest dsRNA a P-A-S–rich protein that apparently coats but does not encapsidate the viral genome as visualized by atomic force microscopy. A combination of a capping enzyme with a picorna-like RdRP in the AfuTmV-1 genome is a striking case of chimerism and the first example (to our knowledge) of such a phenomenon. AfuTmV-1 appears to be intermediate between dsRNA and positive-strand ssRNA viruses, as well as between encapsidated and capsidless RNA viruses. PMID:26139522

  17. A novel mycovirus from Aspergillus fumigatus contains four unique dsRNAs as its genome and is infectious as dsRNA.

    PubMed

    Kanhayuwa, Lakkhana; Kotta-Loizou, Ioly; Özkan, Selin; Gunning, A Patrick; Coutts, Robert H A

    2015-07-21

    We report the discovery and characterization of a double-stranded RNA (dsRNA) mycovirus isolated from the human pathogenic fungus Aspergillus fumigatus, Aspergillus fumigatus tetramycovirus-1 (AfuTmV-1), which reveals several unique features not found previously in positive-strand RNA viruses, including the fact that it represents the first dsRNA (to our knowledge) that is not only infectious as a purified entity but also as a naked dsRNA. The AfuTmV-1 genome consists of four capped dsRNAs, the largest of which encodes an RNA-dependent RNA polymerase (RdRP) containing a unique GDNQ motif normally characteristic of negative-strand RNA viruses. The third largest dsRNA encodes an S-adenosyl methionine-dependent methyltransferase capping enzyme and the smallest dsRNA a P-A-S-rich protein that apparently coats but does not encapsidate the viral genome as visualized by atomic force microscopy. A combination of a capping enzyme with a picorna-like RdRP in the AfuTmV-1 genome is a striking case of chimerism and the first example (to our knowledge) of such a phenomenon. AfuTmV-1 appears to be intermediate between dsRNA and positive-strand ssRNA viruses, as well as between encapsidated and capsidless RNA viruses. PMID:26139522

  18. An integrated computational pipeline and database to support whole-genome sequence annotation

    PubMed Central

    Mungall, CJ; Misra, S; Berman, BP; Carlson, J; Frise, E; Harris, N; Marshall, B; Shu, S; Kaminker, JS; Prochnik, SE; Smith, CD; Smith, E; Tupy, JL; Wiel, C; Rubin, GM; Lewis, SE

    2002-01-01

    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture. PMID:12537570

  19. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes

    PubMed Central

    Bhawna; Bonthala, V.S.; Gajula, MNV Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131

  20. PvTFDB: a Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes.

    PubMed

    Bhawna; Bonthala, V S; Gajula, Mnv Prasad

    2016-01-01

    The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/. PMID:27465131

  1. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. PMID:27115625

  2. A DATABASE FOR TRACKING TOXICOGENOMIC SAMPLES AND PROCEDURES WITH GENOMIC, PROTEOMIC AND METABONOMIC COMPONENTS

    EPA Science Inventory

    A Database for Tracking Toxicogenomic Samples and Procedures with Genomic, Proteomic and Metabonomic Components
    Wenjun Bao1, Jennifer Fostel2, Michael D. Waters2, B. Alex Merrick2, Drew Ekman3, Mitchell Kostich4, Judith Schmid1, David Dix1
    Office of Research and Developmen...

  3. The PRRS Host Genomic Consortium (PHGC) Database: Management of large data sets.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In any consortium project where large amounts of phenotypic and genotypic data are collected across several research labs, issues arise with maintenance and analysis of datasets. The PRRS Host Genomic Consortium (PHGC) Database was developed to meet this need for the PRRS research community. The sch...

  4. Development of a grape genomics database using IBM DB2 content manager software

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A relational database was created for the North American Grapevine Genome project at the Viticultural Research Center, at Florida A&M University. The collaborative project with USDA, ARS researchers is an important resource for viticulture production of new grapevine varieties which will be adapted ...

  5. The Changing Face of Scientific Discourse: Analysis of Genomic and Proteomic Database Usage and Acceptance.

    ERIC Educational Resources Information Center

    Brown, Cecelia

    2003-01-01

    Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…

  6. Vitigene: A database for grape genomics and genetic resources delivery that benefits grape growers and scientific communities

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A new grape genomic database was established for ‘Native American Grape Species’ as a genomic resource (http://vitigene.famu.edu:9082/eclient/ IDMLogon2.jsp). The new database hosts genetic information collected from disease tolerant/resistant grapevine endemic to North America, and is a valuable re...

  7. SpBase: the sea urchin genome database and web site

    PubMed Central

    Cameron, R. Andrew; Samanta, Manoj; Yuan, Autumn; He, Dong; Davidson, Eric

    2009-01-01

    SpBase is a system of databases focused on the genomic information from sea urchins and related echinoderms. It is exposed to the public through a web site served with open source software (http://spbase.org/). The enterprise was undertaken to provide an easily used collection of information to directly support experimental work on these useful research models in cell and developmental biology. The information served from the databases emerges from the draft genomic sequence of the purple sea urchin, Strongylocentrotus purpuratus and includes sequence data and genomic resource descriptions for other members of the echinoderm clade which in total span 540 million years of evolutionary time. This version of the system contains two assemblies of the purple sea urchin genome, associated expressed sequences, gene annotations and accessory resources. Search mechanisms for the sequences and the gene annotations are provided. Because the system is maintained along with the Sea Urchin Genome resource, a database of sequenced clones is also provided. PMID:19010966

  8. Towards a Universal Clinical Genomics Database: The 2012 International Standards for Cytogenomic Arrays (ISCA) Consortium Meeting

    PubMed Central

    Riggs, Erin Rooney; Wain, Karen E.; Riethmaier, Darlene; Savage, Melissa; Smith-Packard, Bethanny; Kaminsky, Erin B.; Rehm, Heidi L.; Martin, Christa Lese; Ledbetter, David H.; Faucett, W. Andrew

    2013-01-01

    The 2012 International Standards for Cytogenomic Arrays (ISCA) Consortium Meeting, “Towards a Universal Clinical Genomic Database,” was held in Bethesda, MD, 21–22 May 2012 and was attended by over 200 individuals from around the world representing clinical genetic testing laboratories, clinicians, academia, industry, research, and regulatory agencies. The scientific program centered on expanding the current focus of the ISCA Consortium to include the collection and curation of both structural and sequence-level variation into a unified clinical genomics database, available to the public through resources such as the National Center for Biotechnology Information (NCBI)’s ClinVar database. Here, we provide an overview of the conference, with summaries of the topics presented for discussion by over 25 different speakers. Presentations are available online at www.iscaconsortium.org. PMID:23463607

  9. The HuGeMap Database: interconnection and visualization of human genome maps.

    PubMed

    Barillot, E; Pook, S; Guyon, F; Cussat-Blanc, C; Viara, E; Vaysseix, G

    1999-01-01

    The HuGeMap database stores the major genetic and physical maps of the human genome. HuGeMap is accessible on the Web at http://www. infobiogen.fr/services/Hugemap and through a CORBA server. A standard genome map data format for the interconnection of genome map databases was defined in collaboration with the EBI. The HuGeMap CORBA server provides this interconnection using the interface definition language IDL. Two graphical user interfaces were developed for the visualization of the HuGeMap data: ZoomMap (http://www.infobiogen.fr/services/zomit/Zoom Map.html) for navigation by zooming and data transformation via magic lenses, and MappetShow (http://www.infobiogen.fr/services/Mappet) for visualizing and comparing maps. PMID:9847155

  10. Large-Scale Phylogenetic Classification of Fungal Chitin Synthases and Identification of a Putative Cell-Wall Metabolism Gene Cluster in Aspergillus Genomes

    PubMed Central

    Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto

    2014-01-01

    The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134

  11. Design and implementation of a database for Brucella melitensis genome annotation.

    PubMed

    De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

    2008-03-18

    The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe. PMID:18160234

  12. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    PubMed Central

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  13. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes

    PubMed Central

    Chetal, Kashish; Janga, Sarath Chandra

    2015-01-01

    Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics. PMID:26543854

  14. Manual curation is not sufficient for annotation of genomic databases

    PubMed Central

    Baumgartner, William A.; Cohen, K. Bretonnel; Fox, Lynne M.; Acquaah-Mensah, George; Hunter, Lawrence

    2008-01-01

    Motivation Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents. Results Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes. Contact larry.hunter@uchsc.edu PMID:17646325

  15. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  16. CottonGen: a genomics, genetics and breeding database for cotton research.

    PubMed

    Yu, Jing; Jung, Sook; Cheng, Chun-Huai; Ficklin, Stephen P; Lee, Taein; Zheng, Ping; Jones, Don; Percy, Richard G; Main, Dorrie

    2014-01-01

    CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. PMID:24203703

  17. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    PubMed

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. PMID:27242033

  18. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes

    PubMed Central

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033

  19. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  20. dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

    PubMed Central

    Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

    2016-01-01

    We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem. PMID:26777304

  1. RiceVarMap: a comprehensive database of rice genomic variations

    PubMed Central

    Zhao, Hu; Yao, Wen; Ouyang, Yidan; Yang, Wanneng; Wang, Gongwei; Lian, Xingming; Xing, Yongzhong; Chen, Lingling; Xie, Weibo

    2015-01-01

    Rice Variation Map (RiceVarMap, http:/ricevarmap.ncpgr.cn) is a database of rice genomic variations. The database provides comprehensive information of 6 551 358 single nucleotide polymorphisms (SNPs) and 1 214 627 insertions/deletions (INDELs) identified from sequencing data of 1479 rice accessions. The SNP genotypes of all accessions were imputed and evaluated, resulting in an overall missing data rate of 0.42% and an estimated accuracy greater than 99%. The SNP/INDEL genotypes of all accessions are available for online query and download. Users can search SNPs/INDELs by identifiers of the SNPs/INDELs, genomic regions, gene identifiers and keywords of gene annotation. Allele frequencies within various subpopulations and the effects of the variation that may alter the protein sequence of a gene are also listed for each SNP/INDEL. The database also provides geographical details and phenotype images for various rice accessions. In particular, the database provides tools to construct haplotype networks and design PCR-primers by taking into account surrounding known genomic variations. These data and tools are highly useful for exploring genetic variations and evolution studies of rice and other species. PMID:25274737

  2. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  3. Biological database of images and genomes: tools for community annotations linking image and genomic information.

    PubMed

    Oberlin, Andrew T; Jurkovic, Dominika A; Balish, Mitchell F; Friedberg, Iddo

    2013-01-01

    Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype-genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management and web services. By using BioDIG to create websites, researchers and curators can rapidly annotate a large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of mycoplasmas. PMID:23550062

  4. Genome Data from DOOR: a Database for prOkaryotic OpeRons

    DOE Data Explorer

    DOOR (Database of prOkaryotic OpeRons) is an operon database developed by Computational Systems Biology Lab (CSBL) at University of Georgia. Although the operons in the database are based on prediction, there are some unique features. These are: • A algorithm is consistently best at all aspects including sensitivity and specificity for both true positives and true negatives, and the overall accuracy reaches 90 percent. The prediction algorithm is based on this paper: P. Dam, V. Olman, K. Harris, Z. Su, Y. Xu., Operon prediction using both genome-specific and general genomic information, Nucleic Acids Res., 35(1):288-98, 2007 • DOOR provides one of the largest data sets of operon information available to the public. DOOR provides operons for 675 prokaryotic genomes. Although most of operons in DOOR are not verified by experiments, the creators are also trying to provide some limited literature information, which is extracted from ODB. They emphasize that if the users are looking for strictly experimentally verified operons, they should look into DBTBS and RegulonDB first. • Operons which include RNA genes, which are rarely seen in other operon databases especially for predicted operon databases • Defined the similarity scores between operons, which is based on weighted maximum matching between operons. Similar operon groups can be used to predict accurate orthologous genes,and their upstream regions can be used to find the consensus binding motifs. • Integration of two motif finding programs in the database: MEME and CUBIC. DOOR provides an Organism View for browsing, a gene search tool, an operon search tool, and the operon prediction interface.[Text taken and edited from http://csbl1.bmb.uga.edu/OperonDB/tutorial.php

  5. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  6. Scriptable access to the Caenorhabditis elegans genome sequence and other ACEDB databases.

    PubMed

    Stein, L D; Thierry-Mieg, J

    1998-12-01

    Much of the world's genomic data are available to the community through networked databases that are accessed via Web interfaces. Although this paradigm provides browse-level access and has greatly facilitated linking between databases, it does not provide any convenient mechanism for programmatically fetching and integrating data from diverse databases. We have created a library and an application programming interface (API) named AcePerl that provides simple, direct access to ACEDB databases from the Perl programming language. With this library, programmers and computer-savvy biologists can write software to pose complex queries on local and remote ACEDB databases, retrieve the data, integrate the results, and move data objects from one database to another. In addition, a set of Web scripts running on top of AcePerl provides Web-based browsing of any local or remote ACEDB database. AcePerl and the AceBrowser Web browser run on Unix systems and are available under a license that allows for unrestricted use and redistribution. Both packages can be downloaded from URL. A Microsoft Windows port of AcePerl is in the planning stages. PMID:9872985

  7. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. PMID:27278816

  8. SorghumFDB: sorghum functional genomics database with multidimensional network analysis.

    PubMed

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  9. SorghumFDB: sorghum functional genomics database with multidimensional network analysis

    PubMed Central

    Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen

    2016-01-01

    Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein–protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants. Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. PMID:27352859

  10. Elucidation of veA Dependent Genes Associated with Aflatoxin and Sclerotial Production in Aspergillus flavus by Functional Genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The aflatoxin-producing fungi, Aspergillus flavus and A. parasiticus, form structures called sclerotia that allow for survival under adverse conditions. Deletion of the veA gene in A. flavus and A. parasiticus blocks production of aflatoxin, as well as sclerotial formation. We used microarray tech...